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Abstract 

In this paper, a logic program synthesis method from first 
order logic specifications is described. The specifications 
are described by Horn clauses extended by universally 
quantified implicational formulae. Those formulae are 
transformed into definite clause programs by meaning- 
preserving unfold/fold transformation. We show some 
classes of first order formulae which can be successfully 
transformed into definite clauses automatically by un- 
fold/fold transformation. 

1 Introduction 

Logic program synthesis based on unfold /fold transfor- 
mation [1] is a standard method and has been investi- 
gated by many researchers [2, 3, 5, 6, 11, 12, 19]. As 
for the correctness of unfold/fold rules in logic program- 
ming, Tamaki and Sato proposed meaning-preserving 
unfold/fold rules for definite clause programs [20]. Then, 
Kanamori and Horiuchi proposed unfold/fold rules for a 
class of first order formulae [7]. Recently, Sato proposed 
unfold/fold rules for full first order formulae [18]. 

In the studies of program synthesis, unfold/fold rules 
are used to eliminate quantifiers by folding to obtain def- 
inite clause programs from first order formulae. How- 
ever, in most of those studies, unfold/fold rules were ap- 
plied nondeterministically and general methods to derive 
definite clauses were not known. Recently, Day antis [3] 
showed a deterministic method to derive logic programs 
from a class of first order formulae. Sato and Tamaki [19] 
also showed a deterministic method by incorporating the 
concept of continuation. 

This paper shows another characterization of classes of 
first order formulae from which definite clause programs 
can be derived automatically. Those formulae are de- 
scribed by Horn clauses extended by universally quanti- 
fied implicational formulae. As for transformation rules, 
Kanamori and Horiuchi’s unfold/fold rules are adopted. 
A synthesis procedure based on unfold/fold rules is given, 
and with some syntactic restrictions, those formulae are 
successfully transformed into equivalent definite clause 
programs. This study is also an extension of those by 



Pettorossi and Proietti [14, 15, 16] on logic program 
transformations . 

The rest of this paper is organized as follows. Section 
2 describes unfold/fold rules and formalizes the synthesis 
process. Section 3 describes a program synthesis proce- 
dure and proves that definite clause programs can be suc- 
cessfully derived from some classes of first order formulae 
using this procedure. Section 4 discusses the relations to 
other works and Section 5 gives a conclusion. 

In the following, familiarity with the basic terminolo- 
gies of logic programming is assumed[13]. As syntactical 
variables, X,Y, Z,U,V are used for variables, A,B,H 
for atoms and F, G for formulae, possibly with primes 
and subscripts. In addition, 6 is used for a substitution, 
FQ for the formula obtained from formula F by applying 
substitution 9, X for a vector of variables and Fc[G'] for 
replacement of an occurrence of subformula G of formula 
F with formula G'. 

2 Unfold/Fold Transformation 
for Logic Program Synthesis 

In this section, preliminary notions of our logic program 
synthesis are shown. 

2.1 Preliminaries 

Preliminary notions are described first. 

A formula is called an implicational goal when it is of 
the form F\ — * F 2 , where F\ and F 2 are conjunctions of 
atoms. 

Definition 2.1 Definite Formula 

Formula C is called a definite formula when C is of 
the form 

A < — G\ A Gi A • • • A G n {n > 0), 
where G{ is a (possibly universally quantified) conjunc- 
tion of implicational goals for i = 1 , 2, . . . , n. A is called 
the head of C, G\ A (? 2 A ... A G n is called the body of 
C and each Gi is called a goal in the body of C. 
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Note that the notion of a definite formula is a restricted 
form of that in [7]. 

A set of definite formulae is called a definite formula 
program, while a set of definite clauses is called a definite 
clause program. We may simply say programs instead of 
definite formula (or clause) programs when it is obvious 
to which we are referring. 

Definition 2.2 Definition Formula 

Let P be a definite formula program. A definite for- 
mula D is called a definition formula for P when all the 
predicates appearing in D's body are defined by definite 
clauses in P and the predicate of D ' s head does not ap- 
pear in P. The predicate of D’s head is called a new 
predicate , while those defined by definite clauses in P 
are old predicates. A set of formulae T) is called a defi- 
nition formula set for P when every element D of V is 
a definition formula for P and the predicate of D’ s head 
appears only once in V. 

Atoms with new predicates are called new atoms, while 
those with old predicates are called old atoms. 

2.2 Unfold/Fold Transformation 

In this subsection, unfold/fold transformation rules are 
shown following [7]. Below, we assume that the logical 
constant true implicitly appears in the body of every unit 
clause. Further, we assume that a goal is always deleted 
from the body of a definite formula when it is the logical 
constant true, and a definite formula is always deleted 
when some goal in its body is the logical constant false. 

Further, we introduce the reduction of implicational 
goals with logical constant true and false, such as 
-i true =>• false, true A F =$• F, and so on. (See [7] for 
details.) Let G be an implicational goal. The reduced 
form of G , denoted by G [, is the normal form in the 
above reduction system. 

Variables not quantified in formula F are called global 
variables of F. Atoms appearing positively (negatively) 
in formula F are called positive (negative) atoms of F. 

Definition 2.3 Positive Unfolding 

Let Pi be a program, C be a definite formula in Pi, 
G be a goal in the body of C and A be a positive old 
atom of G containing no universally quantified variable. 
Then, let Go be Ga [false] J, and C' 0 be the definite for- 
mula obtained from C by replacing G with G 0 . Further, 
let Ci , C 2 , ■ ■ ■ , Ck be all the definite clauses in P,- whose 
heads are unifiable with A, say by mgu’s 0\,0 2 , . . . ,0 k . 
Let Gj be the reduced form of GOj after replacing AOj in 
GOj with the body of CjOj, and C" be the definite formula 
obtained from C6j by replacing GOj in the body with Gj. 
(New variables introduced from Cj are global variables 
of Gj.) Then, P M = (P, - {C}) U {CJ, C { , C ' 2 , . . . , C’ h }. 
C'0, C[ , C' 2 , . . . , C' k are called the results of positive un- 
folding C at A (or G). 



Example 2.1 Let P be a definite clause program as fol- 
lows : 

Ci : list([]). 

C 2 : list([X|L]) list(L). 

C3 : 0 < suc(Y). 

C 4 : suc(X) < suc(Y) X < Y. 

C 5 : member(U,[U|L]). 

C 6 : member(U,[V|L]) «— member(U,L). 

Let CV be a definition formula for P as follows : 

C7 : less-than-all(X,L) 

list(L) A V Y(member(Y,L) — 4 X<Y). 

Suppose that Po = PU {C7}. Then, by unfolding C7 at 
list(L), program Pi = P U {Cs, C 9 ] is obtained, where 
C$ '■ less-than-all(X,[j) <— V Y(member(Y,[]) — 4 X<Y). 
C 9 : less-than-all(X,[Z|L]) <— 

list(L) A V Y(member(Y,[Z|L]) -4 X<Y). 

Before showing the negative unfolding rule, we intro- 
duce the notion of terminating atoms. Intuitively, atom 
A is terminating when every derivation path of A is fi- 
nite. See [7] for the precise definition. 

Definition 2.4 Negative Unfolding 

Let Pi be a program, C be a definite formula in P,-, G 
be a goal in the body of C and A be a negative old atom 
of G such that every atom obtained from A by instanti- 
ating all global variables in A to ground is terminating. 
Let Ci, C'2, . . . , C k be all the definite clauses in P t - whose 
heads are unifiable with A, say by mgu’s 0\,0 2 , . . . ,6 k , 
where Oj instantiates no global variable in G. Let Go be 
Ga [false] J, and Gj be the reduced form of GOj after re- 
placing AOj in GOj with the body of CjOj. (New variables 
introduced from Ci are universally quantified variables in 
G{.) Let C" be the definite formula obtained from C by 
replacing G in the body of C with Go A G\ A . . . A G k - 
Then, P t+ 1 = (Pi — { C }) U {C'}. C’ is called the results 
of negative unfolding C at A (or G). 

Example 2.2 Let P and Pi be programs in Exam- 
ple 2.1. By unfolding C 8 at member(X,[|), P 2 = P U 
{C9, C10} is obtained, where 
C10 : les s- 1 h an- all (X , [] ) <- V Y ( false — > X < Y) j. 
that is, 

C10 : less-than-all(X,[]). 

Further, by unfolding C9 at member(X,[Z|Lj), P 3 = P U 
{C10, C11 } is obtained, where 

C11 : less-than-all(X,[Z|L]) *— list(L) A 

V Y (false -4 X<Y) j A 

V Y (true -4 X<Z) j A 

V Y (member(Y,L) -4 X<Y)|. 

that is, 

C11 : less-than-all(X,[Z|L]) <— list(L) A 

X < Z A V Y (member(Y,L) -4 X < Y). 

Definition 2.5 Folding 

Let Pi be a definite formula program, C be a definite 
formula in P* of the form A <— K A L and D be a definite 
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formula of the form B <— K' , where K,K' and L are 
conjunctions of goals. Suppose that there exists a sub- 
stitution 9 such that K'Q = K holds. Let C' be a clause 
of the form A <- B$, L. Then P i+1 = ( P, - { C }) U {C'}. 

Note that when applying folding, some conditions have 
to be satisfied to preserve the meanings of programs. See 
[7] for details. 

Example 2.3 Let P and P 3 be programs in Exam- 
ple 2.2. By folding CVi by C 7 , P 4 = P U {C 10 , C 12 } is 
obtained, where 

C 12 : less-than-all(X,[Y|L]) «- 

X < Y A less-than-all(X,L) 

2.3 Program Synthesis by Unfold/Fold 
Transformation 

In this subsection, our program synthesis problem is for- 
malized. Firstly, several notions are defined to formalize 
the program synthesis processes. 

Definition 2.6 Descendant and Ancestor Formula 
Let P be a definite formula program, C be a definite 
formula in P and P' be a definite formula program ob- 
tained from P by successively applying positive or nega- 
tive unfolding to P. A definite formula C' in P' is called 
a descendant formula of C when 

(a) C' is identical to C, or 

(b) C is the result of positive or negative unfolding of 
a descendant formula of C . 

Conversely, C is called an ancestor formula of C' . 

Example 2.4 In Examples 2.1 - 2.3, definite formulae 
CV, Cg, . . • , Ci 1 are descendant formulae of C7. 

Definition 2.7 U-selection Rule 

A rule that determines what transformation should be 
applied to a definite formula program is called a selection 
rule. Let P be a definite formula program and C be a 
definite formula in P. A selection rule R is called a U- 
selection rule for P rooted on C when R always selects 
positive or negative unfolding applied to a descendant 
formula of C. C is called the root formula for R (or 
of the transformation.) A definite formula program ob- 
tained from P by successively applying transformation 
rules according to R is called a definite formula program 
obtained from P via R. 

Definition 2.8 Closed Program 

Let P be a definite clause program, C be a definition 
formula for P, V be a definition formula set for P and R 
be a U-selection rule for P U {C} rooted on C. Let P' be 
a definite formula program obtained from PU {C} via R. 
P' is said to be closed with respect to triple < P, C, V > 
when every descendant formula C' of C in P' satisfies 
one of the following: 



(a) C' is a definite clause. 

(b) There exists a goal G consisting of positive atoms 
only in the body of C' such that an old atom in G is 
not unifiable with the head of any definite clause in P' . 

(c) By successively folding C' by clauses in {C} U T>, a 
definite clause can be obtained. 

PU {C} is said to be closed with respect to T> when there 
exists a closed program with respect to < P, C, V > and 
for every definition formula D in V there exists a closed 
program with respect to < P,D,V U {C} >. 

Example 2.5 Let P and P 3 be programs in Exam- 
ple 2.2. Then, P 3 is closed w.r.t. < P, C 7 , 0 >. Further, 
P U {CV} is closed w.r.t. 0. 

The above framework is an extension of the one shown 
in [ 8 ], and also a modification of the one Pettorossi and 
Proietti proposed [14, 15, 16] in their studies of program 
transformation. 

Now, our problem can be formalized as follows: for 
given definite clause program P and definition formula 
C for P, find a finite definition formula set V for P such 
that P U {C} is closed with respect to V. 

3 Some Classes of First Order 
Formulae from Which Logic 
Programs Can Be Derived 

In this section, we specify some classes of first order for- 
mulae from which definite clause programs can be de- 
rived by unfold /fold transformation. 

3.1 A Program Synthesis Procedure 

In this subsection, we show a naive program synthesis 
procedure. In the following, we borrow some notions 
about programs in [15, 16]. We consider definite formula 
(clause) programs with predicate =, which have no ex- 
plicit definition in the programs. Predicate = is called 
a base predicate , while other predicates are called de- 
fined predicates. Atoms with base predicates are called 
base atoms, while those with defined predicates are called 
defined atoms. Transformation rules can be applied to 
defined atoms only. 

A formula containing base atoms can be reduced by 
unifying arguments of =. When a universally quanti- 
fied variable and a global variable are unified, the global 
variable is substituted for the universal one. The above 
reduction is called the reduction with respect to —. We 
assume that no formulae are reduced w.r.t. = unless this 
is explicitly mentioned. 

Further, we assume that the following operations are 
always applied implicitly to the results of positive or neg- 
ative unfolding. Goals G is said to be connected when 
at most one universally quantified implicational goal G' 
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appears in G and each atom in G' has common univer- 
sally quantified variables with at least one another atom 
in G' . Let C be a definite formula such that all the goals 
in its body are connected. Let C' be one of the results of 
positive or negative unfolding C at some goal. By logical 
deduction, definite formulae C[, C ' 2 , . . . , C' m (m > 1 ) are 
obtained from C' such that all the goals in the body of 
C[ are connected. (Note that some goal G in the body of 
C is of the form F% — > F 2 or Fj V F 2 and no universally 
quantified variables appear in both F\ and F 2 , C' can be 
split into two formulae by replacing G in C' with ->F\ 
(or Fi) and F 2 .) 

Before showing our program synthesis procedure, a no- 
tion is defined. 

Definition 3.1 Sound Unfolding 

Suppose that positive or negative unfolding is applied 
to a definite formula at atom A. Then, the application 
of unfolding is said to be sound when no two distinct 
universally quantified variables in A are unified when 
reducing the result of unfolding with respect to =. 

Some syntactic restrictions on programs ensure the 
soundness of all possible applications of unfolding. In 
fact, the restriction shown in [3] ensures the soundness. 
However, in the following, we assume that every applica- 
tion of unfolding is sound, without giving any syntactic 
restriction, for simplicity. 

Now, we show our program synthesis procedure, which 
is similar to partial evaluation procedures(cf.[9, 10]). 
First, a procedure to synthesize new predicates is shown. 

Procedure 3.1 Synthesis of New Predicates 
Suppose that definite formula program P and definite 
formula C in P of the form A <— Gi, G 2 , . . . , G n are 
given. Let G\ be the reduced formula obtained from Gi 
by removing all base atoms and by replacing all univer- 
sally quantified variables appearing in every base atom 
with distinct fresh global variables if global variables are 
substituted for them when reducing Gi w.r.t. =. Let Di 
be of the form Hi *— for i = 1, 2 , . . . , n, where Hi is 
an atom whose predicate does not appear in P or Hj for 
i 7 ^ j and whose arguments are all global variables of C 
appearing in G\. Then, D 1 ,D 2 ,. . . ,D n are returned. 

Note that in Procedure 3.1, C can be folded by 
Di, D 2 , . . . , D n after reducing it w.r.t. = when C is the 
result of sound unfolding, and the result of the folding is 
a definite clause. 

Example 3.1 Let P be a program as follows. 

Ci : all-less-than(L,M) <— list(L) A list(M) A 

VU,V (member(U,L) A member(V,M) — ► U < V). 

C 2 : member(U,[VjX]) <— U = V. 

C 3 : member(U,[VjX]) <— member(U,X). 

The definition of ‘<’ is given in Example 2.1. Suppose 
that C s body consists of only one goal. By applying 



positive unfolding and negative unfolding to C succes- 
sively, the following formulae are obtained. (The reduc- 
tion w.r.t. = is done when no universally quantified vari- 
able appears as an argument of =.) 

C 4 : all-less-than([],M) *— list(M). 

C 5 : all-less-than([X|L],M) <— (list(L) A list(M)) A 
(list(L) A list(M) A 

V U,V (U = X A member(V,M) -» U < V)) A 
(list(L) A list(M) A 

V U,V (member(U,L)Amember(V,M) -> U < V)). 
Then, by Procedure 3.1, the following new predicates are 
defined from C 5 . 

Di : newl(X,L,M) «— list(L) A list(M) A 

V V (member (V,M) -* X < V). 

D 2 : new2(L,M) <— list(L) A list(M) A 

V U,V (member(U,L) A member(V,M) -+ U < V). 

Next, the whole procedure for program synthesis is 
shown. 

Procedure 3.2 A Program, Synthesis Procedure 
Suppose that definite clause program P and definition 
formula C for P are given. Let T> be the set {C}. 

(a) If there exist no unmarked formulae in V , then re- 
turn P and stop. 

(b) Select an unmarked definition formula D from T>. 
Mark D ‘selected.’ Let P' be the set {D}. 

(c) If there exist no formulae in P' which do not satisfy 
conditions (a) and (b) in Definition 2.8, then P := 
P yj P' and go to (a). 

(d) Select a definite formula C' from P' . Apply positive 
or negative unfolding to C' . Let Ci,...,C n be the 
results. Remove C' from P' . 

(e) Apply Procedure 3.1 to Ci , . . . , C n . Let D \, . . . , D m 
be the outputs. Add Di to V if it is not a definite clause 
and there exists no formula in T> which is identical to D t 
except for the predicate of the head. Fold C 4 , . . . , C n 
by the formulae in V and add the results to ?' . 

(f) Go to (c). 

Example 3.2 Consider the program in Example 3.1 
again. We see that D 2 is identical to C except for the 
predicate of the head. C 5 can be folded by D\ and C 
after reduction w.r.t. =. The result is as follows. 

C 6 : all-less-than([X|L] ,M) <- list(L) A list(M) A 
newl(X,L,M) A all-less-than(L,M). 

Similar operations are applied to D\, and finally, the 
following clauses are obtained. 

D 3 : newl(X,L,[]) 4 — list (L). 

D 4 : newl(X,L,[Y|M]) X < Y A newl(X,L,M). 

Note that Procedure 3.2 does not necessarily derive 
a definite clause program from a definite formula pro- 
gram. For example, when the following program is given 
as input, Procedure 3.2 does not halt. 

Ci : p(X,Y) p(X,Z) A p(Z,Y) 

C, ■■ h(X,Y) - VZ (p(X,Z) p(Y,Z)) 
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3.2 Classes of First Order Formulae 

In this section, we show some classes of definite formula 
programs which can be transformed into equivalent def- 
inite clause programs by Procedure 3.2. 

Throughout this subsection, we assume that unfolding 
is always applicable to every definite formula at an atom 
when there exist definite clauses whose heads are unifi- 
able with the atom. Note that the above assumption 
does not always hold. This problem will be discussed 
in 3.3. 

After giving a notion, we show a theorem which is an 
extension of the results shown in [15]. A simple expres- 
sion is either a term or an atom. 

Definition 3.2 Depth of Symbol in Simple Expression 
Let A be a variable or a constant and £ be a simple 
expression in which A appears. The depth of A in A, 
denoted by dept h( A, A), is defined as follows. 

(a) depth(A,A) = 1. 

(b) depth(A,A) = max{depth(A,f;)|A appears in f 
for i = 1, . . . ,n} + 1, if A is either /(A,...,f n ) or 
p(<i, ■ . . ,t n ), for any function symbol / or any predi- 
cate symbol p. 

The deepest variable or constant in A is denoted by 
maxdepth(A). 

Theorem 3.1 Let A be a definite clause program. Sup- 
pose that for any definition formula C for A, there exists 
a U-selection rule R for AU {C} rooted on C such that R 
is defined for all descendant clauses of C in which at least 
one defined atom appears. Suppose also that there exist 
two positive integers H and W such that every descen- 
dant clause C' of C in every program P' obtained from 
A U {C} via R satisfies the following two conditions. 

(a) The depth of every term appearing in every goal in 
the body of C' is less than H. 

(b) Let Gi,G 2 , . . . , G n be connected goals in the body 
of C' . Then, the number of atoms appearing in Gi is 
less than W, for i = 1, 2, . . . , n. 

Then, there exists a finite definition formula set V for A 
such that A U {C} is closed with respect to V. 

Proof. From hypothesis (a), only a finite number of dis- 
tinct atoms (modulo renaming of variables) can appear 
in the goals of all the descendant formulae of C. Then, 
apply Procedure 3.2 to A and C. Note that every goal in 
the body of every descendant formula of C is connected. 
Then, for every goal of every descendant formula of C, 
the number of atoms appearing in the goal is less than 
W, from hypothesis (b). Hence, only a finite number of 
distinct goals can appear in all the descendant formulae 
of C . Thus, we can obtain a finite definition formula 
set Vq for A such that there exists a closed program P' 
w.r.t. < P,C,V 0 >. 

The above discussion holds for all the definition for- 
mulae in V o, since those formulae are constructed from 



bodies of the descendant formulae of C . Evidently, only 
a finite number of distinct definition formulae can be de- 
fined. Thus, there exists a finite definition formula set V 
for A such that A U {A} is closed w.r.t. V. □ 

Theorem 3.1 shows that Procedure 3.2 can derive a 
definite clause program when (a) a term of infinite depth 
can not appear, or (b) an infinite number of atoms can 
not appear in a connected goal during a transformation 
process. In the following, we show some syntactic restric- 
tions on programs which satisfy the above conditions. 

Proietti and Pettorossi showed some classes of definite 
clause programs which satisfy the conditions in Theo- 
rem 3.1 in their studies of program transformation [15]. 
We show that some extensions of their results are appli- 
cable to our problem. 

The following definitions are according to [15]. The set 
of variables occurring in simple expression A is denoted 
by var(A). 

Definition 3.3 Linear Term Formula and Program 

A simple expression or a formula is said to be linear 
when no variable appears in it more than once. A definite 
formula (clause) is called a linear term formula (clause) 
when every atom appearing in it is linear. A definite 
formula (clause) program is called a linear term program 
when it consists of linear term formulae (clauses) only. 

A linear term formula (clause) is called a strongly lin- 
ear term formula ( clause ) when its body is linear. A def- 
inite formula (clause) program is called a strongly linear 
term program when it consists of strongly linear term 
formulae (clauses) only. 

Note that the following definite clause is not a linear 
term clause. 

member(X,[X|L]). 

However, it is easy to obtain an equivalent linear term 
clause as follows : 

member(X,[Y|L])<— X=Y. 

Definition 3.4 A Relation < between Linear Simple 
Expressions 

Let Ai and A 2 be linear simple expressions. When 
depth(X,Ai)<depth(X,A 2 ) holds for every variable X in 
var(Ai)flvar(A 2 ), we write E\ < A 2 . (Both E\ < A 2 and 
A 2 < Ai hold when var(Ai)flvar(A 2 )= 0. ) 

Definition 3.5 Non- Ascending Formula and Program 

Let A be a linear term formula and H be the head of 
C. C is said to be non-ascending when A < H holds 
for every defined atom A appearing in the body of C. A 
linear term program is said to be non-ascending when it 
consists of non-ascending formulae only. 

A definite formula (clause) is said to be strongly non- 
ascending when it is a strongly linear term formula 
(clause) and non- ascending. A definite formula (clause) 
program is said to be strongly non-ascending when it 
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consists of strongly non- ascending formulae (clauses) 
only. 

Definition 3.6 Synchronized Descent Rule 

Let P be a linear term program, R be a U-selection 
rule for P and C be any descendant formula of the root 
formula for R. Let A 1 , A 2 , ■ ■ - , A n be all the atoms ap- 
pearing in the body of C . Then, R is called a synchro- 
nized descent rule when 

(a) R selects the application of positive or negative un- 
folding to C at A t if and only if Aj < A{ holds for 
j = 1 , . . . , n, and 

(b) R is not defined for C , otherwise. 

Note that synchronized descent rules are not neces- 
sarily defined uniquely for given programs and definition 
formulae. 

The following theorem is an extension of the one shown 
in [15, 16]. 

Lemma 3.2 Let P be a non- ascending definite clause 
program, C be a linear term definition formula for P, and 
R be a synchronized descent rule rooted on C. Let P' be 
a program obtained from PU{C} via R. For each defined 
atom A appearing in the body of every descendant clause 
of C in P', the following holds : 
maxdepth(A) < 

max{ma.xdepth(P)| B is a defined atom in PU{C}} 

Proof. By induction on the number of applications of 
unfolding. □ 

Now we show some classes of definite formula programs 
which satisfy the hypotheses of Theorem 3.1. In the fol- 
lowing, for simplicity, we deal with definition formulae 
with only one universally quantified implicational goal 
in the body. The results are easily extended to the defi- 
nite formulae with a conjunction of universally quantified 
implicational goals. 

The following results are also extensions of those 
shown in [15]. 

Theorem 3.3 Let P be a strongly non-ascending def- 
inite clause program and C be a linear term definition 
formula for P of the form H <— A\ A VA r (A 2 — ► A 3 ), such 
that the following hold. 

(a) For every clause D in P of the form Ho Pi A ... A 
B n AB[A...A B' m , where Bi , . . . , B n are defined atoms 
and B [, . . . , B' m are base atoms, the following hold. 
(a-1) Let tfj be any argument of Hjj. For every argu- 
ment t{ of Bi , if tfj contains a common variable with 
L, then is a subterm of tjj. 

(a-2) For every argument t % of Bi, if U is a subterm 
of an argument tu of Hp, then no other argument of 
B t is a subterm of tff. 

(b) There exist two arguments t t and s t - of some A; ( t{ 7^ 
Si , i =1,2 or 3) such that the following hold. 



(b-1) There exists an argument tj of Aj (i 7I j) such 
that 

■ vars(A,)nvars(Aj)=vars(t,)fTvars(fj), and 
• either t{ is a subterm of tj, tj is & subterm of U or 
vars(fj-)nvars(tj)=0. 

(b-2) There exists an argument s* of Ak ( k ^ i,j) 
such that the same relations as above hold for S{ and 
3 *. 

(b-3) Aj contains no common variable with A*. 

Then, there exists a definition formula set T> for P such 
that PU{(7} is closed with respect to T>. 

Proof. Note that there exists an atom A in the body of C 
s.t. an argument of A is a maximal term in the body of 
C w.r.t. subterm ordering relation. Let C' be any result 
of unfolding C at A and G be any connected goal in the 
body of C' of the form F\ A VX(F 2 — > P3), where P,- is a 
conjunction of atoms. Then, from the hypothesis, it can 
be shown that a similar property to hypothesis (b) holds 
for G. Note that the number of implicational goals dose 
not increase by applying positive unfolding and no global 
variables are instantiated by applying negative unfolding. 
Then, again there exists an atom in the body of C s.t. 
one of its arguments is a maximal term in the body of 
C' w.r.t. subterm ordering relation. By induction on 
the number of applications of unfolding, a synchronized 
descent rule can be defined for every descendant formula 
of C . Then, from Lemma 3.2, the depth of every term 
appearing in every descendant clause of C is bounded. 

Note that the number of different subterms of a term 
is bounded. Then, from the hypothesis, the number of 
atoms appearing in every connected goal in the body of 
every descendant formula of C is bounded. Thus, P and 
C satisfy the hypotheses of Theorem 3.1. Hence, there 
exists a definition formula set V for P such that PU {C} 
is closed with respect to V. □ 

Note that Theorem 3.3 holds for any nondeterministic 
choice of synchronized descent rules in the above proof. 
Note also that any program can be modified to satisfy 
hypothesis (a) of Theorem 3.3 by introducing atoms with 
= in the body. 

Corollary 3.4 Let P be a strongly non-ascending defi- 
nite clause program and P' be a definite clause program 
such that no predicate appears in both P and P' . Let 
C be a linear term definition formula for P U P' of the 
form Ff «— Ax A VAf(A2 — ► A 3 ), where the predicates of 
Ai and A 2 are defined in P and that of A 3 is defined in 
P' . Suppose that the following hold. 

(a) Hypothesis (a) of Theorem 3.3 holds for every clause 
D in P. 

(b) There exist arguments tj of Ai and t 2 of A 2 such 
that the following hold. 

(b-1) vars(Ai)nvars(A2)=vars(t 1 )nvars(t 2 )- 
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(b-2) Either t x is a subterm of t 2 , h is a subterm of N 
or vars(t 1 )nvars(t 2 )=0- 

(c) No variable in A 3 is instantiated by applying posi- 
tive or negative unfolding to C successively. 

Then, there exists a definition formula set V for P U P' 
such that P U P' U {C} is closed with respect to V. 

Proof. Suppose that unfolding is never applied at A 3 . A 
synchronized descent rule can be defined by neglecting 
A 3 . Since variables in A 3 are never instantiated, no other 
atoms are derived from A 3 . Thus, the corollary holds. □ 

In Corollary 3.4, no restrictions are required on the 
definition of A 3 . This result corresponds to that in [3]. 
Note that any program can be modified to satisfy hy- 
pothesis (c) of Corollary 3.4 by introducing atoms with 
= in the body. 

Example 3.3 The program and the definition formula 
in Example 2.1 satisfy the hypotheses of Theorem 3.3 and 
Corollary 3.4, if clause C5 is replaced with the equivalent 
clause : 

C' : member (U,[V|L]) «- U=V. 

In fact, a definite clause program can be obtained, as 
shown in subsection 2.2. 

Next, we show an extension of the results shown in 
Theorem 3.3. Let P be a non-ascending definite clause 
program and C be a definition formula for P of the form 
H <— AA'iX(F 1 — > F 2 ), where A is an atom, and F\ and 
F 2 are conjunctions of atoms. Let Di be the definition 
clause for P of the form Hi <— F{ for i = 1,2. If F; 
can be transformed into a set of definite clauses which 
satisfies the hypotheses of Theorem 3.3, by replacing F{ 
with Hi , we can show that P U {C} can be transformed 
into an equivalent definite clause program. 

The above problem is related to the foldability prob- 
lem in [16]. The foldability problem is described infor- 
mally as follows. Let P be a definite clause program and 
C be a definition clause for P . Then, find program P' 
obtained from ?U {C} which satisfies the following : for 
every descendant clause C' of C in P', there exists an an- 
cestor clause D of C' such that C n s body is an instance 
of D’s. 

Proietti and Pettorossi showed some classes of definite 
clause programs such that thtf foldability problem can be 
solved [16]. We show that their results are also available 
to our problem. 

A definite clause program P is said to be linear recur- 
sive when at most one defined atom appears in the body 
of each clause in P . Note that a linear recursive and 
linear term program (clause) is a strongly linear term 
program (clause). 

Lemma 3.5 Let P be a linear recursive non- ascending 
program and C be a non-ascending definition clause for 
P of the form H <— Ai A A 2 A Bi A . . . A B n , where Ai 



and A 2 are defined atoms and B\, . . . , B n are base atoms. 
Suppose that the following hold. 

(a) For every clause D in P of the form F/p <— A^ A 
B[ A ... A B' n , where Ap is the only defined atom in 
the body of Z), the following hold. 

(a-1) Let tj; be any argument of FFp. For every ar- 
gument tj\ of Ap, if ip contains a common variable 
with t A , then t A is a subterm of ip. 

(a-2) For every argument t A of Ap, if t A is a subterm 
of an argument tp of Up, then no other argument of 
Ap is a subterm of tp. 

(b) There exist arguments ti of A\ and t 2 of A 2 such 
that the following hold. 

(b-1) vars(Ai)nvars(A 2 )=vars(t 1 )nvars(t 2 ). 

(b-2) Either t\ is a subterm of i 2 , t 2 is a subterm of t\ 
or vars(fi)nvars(f 2 )=0- 

Then, from FU {C}, we can obtain a linear recursive 
non- ascending program which define the predicate of H 
by unfold/fold transformation. 

Proof. As shown in [16], we can get a solution of the 
foldability problem for P and C. Then, obviously, a 
linear recursive program is obtained. □ 

Example 3.4 Let P be a linear recursive non- 
ascending program as follows. 

Ci : subseq([],L). 

C 2 ■ subseq([X|L],[Y|M]) <— X = Y A subseq(L,M). 
C z : subseq([X|L],[Y|M]) <— subseq([X|L],M). 

Let C be a non-ascending definition clause for P as fol- 
lows. 

C : csub(X,Y,Z) <— subseq(X,Y), subseq(X,Z). 

Then, PU{C} can be transformed into a linear recursive 
non-ascending program as follows. 
csub(Q,Y,Z). 

csub([A|X],[B|Y],Z) <— A = B A cs(A,X,Y.Z). 
csub([A|X],[B|Y],Z) <- csub([A|X],Y,Z). 
cs(A,X,Y,[B|Z]) <- A = B A csub(X,Y,Z). 
cs(A,X,Y,[B|Z]) <- cs(A,X,Y,Z). 

Though Proietti and Pettrossi showed one more 
class [16], we will not discuss this here. 

Now, we get the following theorem. 

Theorem 3.6 Let P be a linear recursive non-ascending 
program and C be a linear term definition formula for 
P of the form H <— A x A VY(A 2 A F 2 -> A 3 A J5 3 ), such 
that the following hold. 

(a) Hypothesis (a) of Lemma 3.5 holds for P. 

(b) Let Si be the set of all the arguments of Ai, and 
Si be the set of all the arguments of A, and B t for 
i = 2, 3. Then, there exist two terms tj and Sj in 
some Sj (tj 7^ Sj, j = 1, 2 or 3) such that the following 
hold. 

(b-1) there exists a term tk in Sk ( j 7^ k) such that 
• vars(5'j)nvars(S’fc)=vars(fj)nvars(ffc), and 
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• either tj is a subterm of tk, tk is a subterm of tj or 
vars(tj)flvars(tfc)=0. 

(b-2) There exists a term si of Si (/ 7^ j , k ) such that 
the same relations as above hold for Sj and si. 

(b-3) Sk contains no common variable with Si . 

Then, there exists a definition formula set T> for P such 
that P U {C} is closed with respect to T). 

Proof. Obvious from Theorem 3.3 and Lemma 3.5. □ 

Note that it is easy to extend the result of Theorem 3.6 
to allow the conjunction of an arbitrary number of atoms 
to appear in the body of the definition formula. Note also 
that it is possible to extend the result to allow arbitrary 
definition of A 3 and B 3 , in a similar way to Corollary 3.4. 

3.3 Further Consideration about Syn- 
tactic Restrictions 

As described in 3.2, the application of unfolding may 
be prohibited in Kanamori and Horiuchi’s framework. 
In this subsection, we discuss some methods to avoid 
prohibition, though we do not necessarily give the pre- 
cise syntactic restriction. (Due to space limitations, we 
do not refer to the terminating property, though several 
sufficient conditions are known to guarantee it.) 

(1) Universally Quantified Variables Appearing 
in Positive Atoms 

Positive unfolding can not be applied to definite formulae 
at positive atoms with universally quantified variables. 
Thus, we have the following two problems. 

(a) Synchronized descent rules can not be defined when 
universally quantified variables are instantiated by neg- 
ative unfolding. 

(b) We can not unfold formulae of the form VIA when 
A is an atom and some variables in X appear in A. 

To avoid case (a), the following restriction is sufficient. 
When applying negative unfolding, no universally quan- 
tified variable is instantiated. Though the restriction 
seems to be strong, most of significant examples of pro- 
gram synthesis can be dealt with under the restriction. 

Case (b) corresponds to the compilation failure in Sato 
and Tamaki’s first order compiler [19]. They restricted 
their language as follows. For every implicational goal 
Tj — > F 2 appearing in a formula, uvar(Fj)3uvar(F2) 
holds, where uvar(Fi) means the set of universally quan- 
tified variables appearing in i 7 ). 

The above condition is available for our problem. Note 
that the application of positive unfolding does not af- 
fect the condition. When applying negative unfolding at 
atom A in universally quantified implicational goal G, 
the following restrictions are also required. All the uni- 
versally quantified variables appearing in A also appear 
in some negative defined atom in each result of negative 



unfolding G , or they are unified with terms consisting of 
constants and global variables by reduction w.r.t. =. 

We believe that techniques such as mode analysis are 
available to guarantee that every applicable negative un- 
folding satisfies the above conditions. 

(2) Global Variables Appearing in Negative 

Atoms 

Negative unfolding should be applied without instantiat- 
ing global variables. In some cases, this restriction may 
be critical. However, we can deal with most of those 
cases by adding positive atoms to the formula such that 
the global variables can be instantiated by applying pos- 
itive unfolding at those atoms. Atoms with predicates 
which specify data types (cf. list) are available. For 
example, with the definitions of ‘member’ and ‘<’ in Ex- 
ample 2.1, negative unfolding can not be applied to the 
definite formula below. 

less-than-all(X,L) +— V Y(member(Y,L) — > X<Y). 
However, we can apply negative unfolding to the formula 
below, after positive unfolding list(L). 

less-than-all(X,L) <— 

list(L) A V Y(member(Y,L) — >■ X<Y). 

(3) Sato’s Unfold/Fold Transformation 

Recently, Sato proposed unfold/fold transformation rules 
for full first order programs [18]. Their unfolding op- 
eration does not require conditions like Kanamori and 
Horiuchi’s. On the other hand, more complex condi- 
tions are required when applying folding. Thus, when 
we adopt Sato’s rules in place of Kanamori and Hori- 
uchi’s, we need not consider the restrictions discussed 
in (1) and (2) above, while some other difficulties are 
introduced to satisfy the folding conditions. 

4 Discussion 

The work described here is an extension of Pettorossi and 
Proietti’s work on program transformation [14, 15, 16]. 
They formalized the successful unfold/fold transforma- 
tion in three ways, and showed that the problem of 
whether a given program can be transformed successfully 
or not is unsolvable. They also showed some classes of 
definite clause programs which can be transformed suc- 
cessfully. Our results owe much to their work, though 
currently we do not know whether our problem is decid- 
able. 

Proietti and Pettorossi also showed that any defi- 
nite clause program can be transformed successfully by 
performing suitable generalization of the atoms to be 
folded [15, 16]. However, the generalization technique 
is not available for our problem. Folding by a definition 
formula obtained by generalizing atoms with universally 
quantified variables may not satisfy the conditions for 
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folding [7], since universally quantified variables can not 
appear in the head of the formula. 

Proietti and Pettorossi also showed a transformation 
procedure called loop absorption [15, 16]. In this pro- 
cedure, they found clause C and its descendant clause 
C' such that C n s body is an instance of C s (or a sub- 
set of C n s body is identical to C s bo.dy). Then, a new 
definition clause whose body is identical to that of C 
is constructed. They also showed a procedure to elimi- 
nate unnecessary variables [17]. We can modify our naive 
procedure described in 3.1 by incorporating the loop ab- 
sorption and the elimination of unnecessary variables. 
Programs obtained by the modified procedure are ex- 
pected to be more efficient and have less code than those 
obtained by the naive procedure. 

There have been several studies on logic program syn- 
thesis from universally quantified implicational formu- 
lae [3, 4, 19]. Our work is closely related to that of 
Dayantis [3]. There, program synthesis was also consid- 
ered from formulae of the form H <— MX (A — > B ). They 
showed that a class of those formulae can be transformed 
into definite clauses by deductive derivation. They also 
discussed the generality of the class using several exam- 
ples. Their deductive method is analogous to unfold/fold 
transformation and the derivation processes almost cor- 
respond to those by our procedure when our procedure 
does not apply positive unfolding. They also mechanized 
their derivation processes. Our notion of the sound- 
ness of the application of unfolding is ensured by part of 
their syntactic restrictions on the arguments of formulae, 
though we have not discussed how this is ensured. How- 
ever, the classes we have shown are still wider than those 
they showed after we incorporate those restrictions. 

Sato and Tamaki showed a deterministic algorithm to 
transform logic programs with universally quantified im- 
plicational formulae into definite clause programs [19]. 
In their method, unfold/fold transformation is applied 
to universal continuation forms. Their method can be 
applied to a wider class of first order formulas than ours, 
while the results of the compilation are not necessarily 
efficient and the code sizes of those results increase gen- 
erally. 

5 Conclusion 

A logic program synthesis method from some classes of 
first order logic specifications have been shown. The 
method is based on unfold/fold transformation. Some 
classes of first order formulae which can be transformed 
into definite clause programs by unfold/fold transforma- 
tion have been shown. 
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Abstract 

We present a procedure for partial deduction of logic pro- 
grams, based on an automatic unfolding algorithm which 
guarantees the construction of sensibly and strongly ex- 
panded, finite SLD-trees. We prove that the partial de- 
duction procedure terminates for all definite logic pro- 
grams and queries. We show that the resulting program 
satisfies important soundness and completeness criteria 
with respect to the original program, while retaining the 
essentially desired amount of specialisation. 

1 Introduction 

Since its introduction in logic programming by Ko- 
morowski ([Komorowski, 1981]), partial evaluation has 
attracted the attention of many researchers in the field. 
Some, e.g. [Venken, 1984], [Venken and Demoen, 1988], 
[Sahlin, 1990], have addressed pragmatic issues re- 
lated to the impurities of Prolog. Others were at- 
tracted by the perspective of eliminating the over- 
head associated with meta interpreters. Some ex- 
amples are: [Gallagher, 1986], [Levi and Sardu, 1988], 
[Safra and Shapiro, 1986], [Sterling and Beer, 1989] and 
[Takeuchi and Furukawa, 1986]. Finally, a firm the- 
oretical basis for the subject was described in 
[Lloyd and Shepherdson, 1991]. 

Just as in [Bruynooghe et al, 1991a], we use the 
term “partial deduction” in this paper, rather than 
the more familiar “partial evaluation”. Following 
[Komorowski, 1989], we do so because we want to leave 
the latter term for works taking into account the non- 
logical features of Prolog and the order in which answers 
are produced. In the present paper, we adhere to the 
viewpoint taken in [Lloyd and Shepherdson, 1991] which 
states that the specialised program should have the same 
answers as the original one. 

‘work partially supported by ESPRIT BRA COMPULOG 
(project 3012) 

t All authors are supported by the Belgian National Fund for 
Scientific Research. 



Indeed, the authors of [Lloyd and Shepherdson, 1991] 
present important criteria which, when satisfied by the 
specialised program, guarantee this to be the case. A 
partial deduction procedure imposing these criteria, is 
described in [Benkerimi and Lloyd, 1990]. However, ter- 
mination of this procedure is not guaranteed, not even 
for definite logic programs. In this paper, we propose 
an alternative method which does terminate for all def- 
inite logic programs. A central part of any partial 
deduction procedure is an unfolding algorithm which 
builds the SLD(NF)-trees used as starting point for 
synthesising specialised clauses. In general, termina- 
tion of this unfolding process is problematic in its own 
right. In [Bruynooghe et al, 1991a], a general crite- 
rion for avoiding infinite unfolding is presented. In the 
present paper, we build on those results for formulat- 
ing a terminating procedure for partial deduction, re- 
specting the soundness and completeness conditions of 
[Lloyd and Shepherdson, 1991]. 



The paper is organised as follows. In section 2, we 
recapitulate (and adapt) some basic concepts in par- 
tial deduction from [Lloyd and Shepherdson, 1991], as 
well as the criteria for soundness and completeness pre- 
sented there. We sketch the partial deduction method 
from [Benkerimi and Lloyd, 1990] and show an exam- 
ple on which the unfolding rules mentioned there do 
not terminate. In section 3, we introduce an au- 
tomatic algorithm for finite unfolding, adapted from 
[Bruynooghe et al, 1991a], Next, in section 4, our par- 
tial deduction procedure is presented. We give an al- 
gorithm which implements it and prove its termination. 
Moreover, we prove that the method satisfies the criteria 
introduced in [Lloyd and Shepherdson, 1991]. We also 
show that the intended specialisation is indeed obtained. 
We conclude the paper in section 5 with a short dis- 
cussion, including a brief comparison with the approach 
of [Benkerimi and Lloyd, 1990] and some directions for 
further research. 
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2 Partial Deduction 

2.1 Basic concepts, soundness and 
completeness 

We assume familiarity with the basics of logic pro- 
gramming. Definitions of the following concepts 
can be found in [Lloyd and Shepherdson, 1991] and 
[Benkerimi and Lloyd, 1990]: most specificic general- 

isation (msg), incomplete SLD-tree, resultant of a 
derivation, partial deduction for an atom in a pro- 
gram, partial deduction for a set of atoms in a pro- 
gram, partial deduction of a program wrt a set of 
atoms, independence of a set of atoms, A -closedness 
of a set of formulas, A -coveredness of a program 
and goal. In [Lloyd and Shepherdson, 1991] and 
[Benkerimi and Lloyd, 1990], the definitions are given 
for normal programs and using the term “partial eval- 
uation” . In the present paper, we restrict ourselves 
to definite programs and goals and, as mentioned 
above, use the term “partial deduction”. The neces- 
sary adaptations are straightforward (as exemplified in 
[Bruynooghe et al, 1991a]). 

We adapt the following theorem from 
[Lloyd and Shepherdson, 1991]. 

Theorem 2.1 Let P be a definite logic program, G a 
definite goal, A a finite, independent set of atoms, and 
P' a partial deduction of P wrt A such that P' U {(?} is 
A-covered. Then the following hold: 

• P' U {(?} has an SLD-refutation with computed an- 
swer Q iff P U { G } does. 

• P' U {£?} has a finitely failed SLD-tree iff P U {(7} 
does. 

In other words, under the conditions stated in this theo- 
rem, computation with a partial deduction of a program 
is sound and complete wrt computation with the original 
program. This is clearly a very desirable characteristic 
of any procedure for partial deduction. It is therefore 
important to devise methods for partial deduction that 
ensure the conditions of theorem 2.1 are satisfied. 

In [Benkerimi and Lloyd, 1990], one such method is 
presented. Basically, it proceeds as follows. For a given 
goal G and program P, a partial deduction for G in P is 
computed. This is repeated for any goal occurring in the 
resulting clauses which is not an instance of one already 
processed. Assuming the procedure terminates, one gets 
in this way a set of clauses S and a set A of partially 
deduced atoms such that S is A-closed. But one also 
wants A to be independent. In order to achieve this, the 
procedure is modified as follows. Whenever a goal occur- 
ring in S is not an instance (nor a variant) of one in A, 
but has a common instance with it, the latter is removed 
from A and a partial deduction is computed for their 
msg (which itself is therefore added to A) and added to 



S. The original partial deduction for the removed goal 
is itself also removed from S. The process stops if A be- 
comes independent and S A-closed. S can then be used 
to synthesize a partial deduction of P wrt A which sat- 
isfies the conditions of theorem 2.1 for any goal G' which 
is an instance of G. 

However, the tactic of taking msgs to make A inde- 
pendent causes an unacceptable loss of specialisation in 
the resulting partial deduction. To remedy this, the 
authors of [Benkerimi and Lloyd, 1990] introduce a re- 
naming transformation as a pre-processing stage be- 
fore running their algorithm. It amounts to duplicat- 
ing and renaming the definitions of those predicates, oc- 
curring in the original goal G, which are likely to pose 
specialisation problems. The details can be found in 
[Benkerimi and Lloyd, 1990]. 

2.2 Unfolding 

One question is left more or less unanswered until now: 
How to obtain the (incomplete) SLD-trees used as a basis 
for producing partial deductions ? In other words, which 
computation rule should be used for building these trees 
(including the question of deciding when to stop the un- 
folding) ? [Benkerimi and Lloyd, 1990] mentions 4 cri- 
teria and proposes the following one as the best : The 
computation rule R v selects the leftmost atom which is 
not a variant of an atom already selected on the branch 
down to the current goal. However, this rule fails to 
guarantee the production of finite SLD-trees in all cases. 
We present a counter-example. It is the well-known “re- 
verse” program with accumulating parameter. 

Example 2.2 

source program: 

reverse([],L,L). 

reverse([X|Xs],Ys,Zs) <— reverse(Xs,[X|Ys],Zs). 
query: 

reverse( [1 ,2|Xs] ,[] ,Zs) . 

The reader can verify that R v generates an infinite SLD- 
tree. 

Some authors have therefore combined R v or other 
computation rules with a depth bound: 
(a.o.) [Levi and Sardu, 1988], [Sterling and Beer, 1986], 
[Takeuchi and Furukawa, 1986]. This does of course 
guarantee finiteness, but it seems a rather ad-hoc so- 
lution which does not reflect any properties of the 
given unfolding problem. We therefore proposed 
an alternative solution in [Bruynooghe et al . , 1991a]. 
(An extended version of this paper can be found in 
[Bruynooghe et al., 1991b].) 
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3 An Algorithm for Finite Un- 
folding 

In [Bruynooghe et al., 1991a], a general criterion for 
avoiding infinite unfolding during partial deduction and 
a terminating unfolding algorithm based on it, are pre- 
sented. In this section, we introduce a fully auto- 
matic version of that algorithm, tuned towards unfold- 
ing object-level definite logic programs. A slightly more 
sophisticated approach may be desirable when dealing 
with meta interpreters. We will not address that point 
in the present paper and concentrate on object-level pro- 
grams. Although a slightly more accurate presentation of 
the algorithm itself is given, most of what follows now is 
adapted from [Bruynooghe et al., 1991a]. The interested 
reader is referred to that paper for a full (and more gen- 
eral) account with all the technical details on the well- 
founded measures underlying our approach. Here, we 
only introduce what is necessary for a good understand- 
ing of algorithm 3.6. 

For technical reasons, we will assume a numbering on 
the nodes of an SLD-tree (e.g. left-to-right, top-down 
and breadth-first). We will use the following notation 
for nodes in an SLD-tree: ( G,i ) where G is a goal of the 
tree having i as its associated number. (The notations 
“(G,i)” and “G” will be used interchangeably, as the 
context requires.) 

We first define a weight-function on terms. It counts 
the number of functors in its argument. 

Definition 3.1 Let Term denote the set of terms in the 
first order language used to define the theory P. We 
define |.j : Term — » ]N as follows: 

If t = /(ti,..., t n ),n > 0 
then jt| = 1 + |ii| + • • • + |£ n | 
else |fj = 0 

It is then possible to introduce weight-functions on 
atoms. 

Definition 3.2 Let p be a predicate of arity n and S= 
{ai, . . . , a m }, 1 < ah < n, 1 < k < m, a set of argument 
positions for p. We define |.| Pi s : {A\A is an atom with 
predicate symbol p} — * IN as follows: 

\p(t\, . . . , t n )\p t S = \t ai ! + ••• + \t<lm | 

The next two definitions introduce useful relations on 
literals and goals in an SLD-tree. 

Definition 3.3 Let (G,z) = ((<— A \, . . . , Aj , . . . , An),i) 
be a node in an SLD-tree r, let R(G ) = Aj be the 
call selected by the computation rule R, let H <— 
Bi, , B m be a clause whose head unifies with Aj 
and let 6 = mgu(Aj, H ) be the most general uni- 
fier. Then (G,i) has a son ( G',k ) in r, ( G',k ) = 
((< A \, . . . , Aj_i, B \ , • • • , B m , Aj+ 1 , . : . , A n )0, A). We 
say that B\0 , . . . , B m 6 in G' are direct descendents of Aj 
in G and that Aj in G is a direct ancestor of Bi 6 , ... , B m 6 



in G'. 

The binary relations descendent and ancestor, defined on 
atoms in goals, are the transitive closures of the direct de- 
scendent and direct ancestor relations respectively. For 
A an atom in G and B an atom in G', A is an ancestor 
of B is denoted as A >pr B (“pr” stands for proof tree). 

Notice that we also speak about one goal G' being an an- 
cestor (or descendent) of another goal G. This terminol- 
ogy refers to the obvious relationships between goals in 
an SLD-tree and should not be confused with the proof- 
tree based relationships between literals, introduced in 
the previous definition. The following definition does 
introduce a relationship between goals, based on defini- 
tion 3.3. 

Definition 3.4 Let G and G' denote two different nodes 
in an SLD-tree r. Let R be the computation rule used 
in r. Then G' covers G iff 

1. R{G') and R(G) are atoms with the same predicate 

2. R(G') >pr R(G) 

Notice that G' covers G implies that G' is an ancestor of 
G. 

We need one more piece of terminology. 

Definition 3.5 Let G and G' denote two different nodes 
in an SLD-tree r. We call G' the youngest covering an- 
cestor of G iff 

1. G' covers G 

2. For any other node G" such that G" covers G, we 
have that G" covers G' 

We are now finally able to formulate the following al- 
gorithm: 

Algorithm 3.6 

Input 

a definite program P 
a definite goal <— A 

Output 

a finite SLD-tree r for P U { 4 - A} 

Initialisation 

r := {(^-A,l)} 

Pr := 0 

Terminated := 0 
Failed := 0 

For each recursive predicate p/n in P and 
for the derivation D in r: 

Sp t D . {1, • • • , 

While there exists a derivation D in r such that 
D $ T erminated do 

Let ( G , i) name the leaf of D 
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Select the leftmost atom p{t\, ■ ■ ■ ,t n ) in G 
satisfying the following condition: 

If p is recursive and there is 
a youngest covering ancestor ( G',j ) of (G,i) in D 
then |^(G , )| P , 5 pD "«- > |p(<i, • • • ,*n)| Pl s p , c "*“ where 
S PtD new = S P , D \ S PiD remove and 

5 remove 

p,D 

£ Sp,D | • • • ) ^n)|p,{afc} |R(G)|p,{a fc }} 

If such an atom p(ti, . . . ,t n ) can be found 
then 

R(G) :=p(fi,...,*n) 

Let Derive(G,i) name the set of all derivation steps 
that can be performed 
If Derive(G,i ) = 0 
then 

Add D to Terminated and Failed 
else 

Let Descend(R(G),i ) name the set of 
all pairs ((72(G), i), (BO, j)), where 
— B is an atom in the body of a clause 
applied in an element of Derive(G,i) 

— 9 is the corresponding m.g.u. 

— j is the number of the corresponding 
descendent of (G, i) 

Expand D in r with the elements of Derive(G ,i) 
Add the elements of Descend(R(G) ,i) to Pr 
For every newly created extension D' of D and 
for every recursive predicate q in P\ 
if q — p and (G,i) has a covering ancestor in D 
then S q<D , := S q , D new 
else Sq^D 1 . — Sq t D 

else 

Add D to Terminated 

Endwhile 

We have the following theorem. 

Theorem 3.7 Algorithm 3.6 terminates. If a definite 
program P and a definite goal <—A are given as inputs, 
its output r is a finite (possibly incomplete) SLD-tree for 
P U {<-/!}. 

Proof The theorem is an immediate consequence of 
proposition 3.1 in [Bruynooghe et al., 1991a]. □ 

Example 3.8 The SLD-tree generated by algorithm 3.6 
for the program and the query from example 2.2, are 
depicted in figure 1. (“reverse” has been abbreviated to 
“rev”.) 

4 Combining These Techniques 

4.1 Introduction 

In the previous section, we introduced an algorithm for 
the automatic construction of (incomplete) finite SLD- 
trees. In this section, we present sound and complete 



«*- rev([l,2IXs],n,Zs) 



rev([2IXs],[l],Zs) 



rev(Xs,[2,l],Zs) 




□ rev(Xs’,[X’,2,l],Zs) 



Figure 1: The SLD-tree for example 3.8. 

partial deduction methods, based on it. Moreover, these 
methods are guaranteed to terminate. The following ex- 
ample shows that this latter property is not obvious, even 
when termination of the basic unfolding procedure is en- 
sured. We use the basic partial deduction algorithm from 
[Benkerimi and Lloyd, 1990], together with our unfold- 
ing algorithm. 

Example 4.1 For the reverse program with accumulat- 
ing parameter (see example 2.2 for the program and the 
starting query), an infinite number of (finite) SLD-trees 
is produced (see figure 2). This behaviour is caused by 
the constant generation of “fresh” body-literals which, 
because of the growing accumulating parameter, are not 
an instance of any atom that was obtained before. 

In [Benkerimi and Lloyd, 1989], it is remarked that a so- 
lution to this kind of problems can be truncating atoms 
put into A at some fixed depth bound. However, this 
again seems to have an ad-hoc flavour to it, and we there- 
fore devised an alternative method, described in the next 
section. 

4.2 An algorithm for partial deduction 

We first introduce some useful definitions and prove a 
lemma. 

Definition 4.2 Let P be a definite program and p a 
predicate symbol of the language underlying P. Then a 
pp' -renaming of P is any program obtained in the fol- 
lowing way: 

• Take P together with a fresh — duplicate — copy of 
the clauses defining p. 

• Replace p in the heads of these new clauses by some 
new (predicate) symbol p' (of the same arity as p). 
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• Replace p by p' in any number of goals in the bodies 
of (old and new) clauses. 

rev([l,2IXs],[],Zs) 



rev([2IXs],[l],Zs) 



rev(Xs,[2,l],Zs) 




□ rev(Xs’,[X’,2,l],Zs) 




□ rev(Xs" ,[X" ,X’ ,2, 1 ] ,Zs) 



*«- rev(Xs" , [X" ,X’ ,2, 1 ] ,Zs) 



Figure 2: An infinite number of (finite) SLD-trees. 

Lemma 4.3 Let P be a definite program and P T a pp'- 
renaming of P. Let G be a definite goal in the language 
underlying P. Then the following hold: 

• P T U {G} has an SLD-refutation with computed an- 
swer 6 iff P U {G} does. 

• P r U {G} has a finitely failed SLD-tree iff P U {G} 
does. 

Proof There is an obvious equivalence between SLD- 
derivations and -trees for P and P T . □ 

Definition 4.4 Let P be a definite program and p a 
predicate symbol of the language underlying P. Then 
the complete pp' -renaming of P is the pp'-renaming of P 
where p has been replaced by p' in all goals in the bodies 
of clauses. 

Our method for partial deduction can then be formu- 
lated as the following algorithm. 



Algorithm 4.5 
Input 

a definite program P 
a definite goal <— A =<— p(t l5 . . . , t n ) 
in the language underlying P 
a predicate symbol p', of the same arity as p, 
not in the language underlying P 

Output 

a set of atoms A 
a partial deduction P T ' of P r , 
the complete pp'-renaming of P, wrt A 

Initialisation 

P r := the complete pp'-renaming of P 
A := {A} and label A unmarked 

While there is an unmarked atom B in A do 
Apply algorithm 3.6 with P T and *—B as inputs 
Let tb name the resulting SLD-tree 
Form P rB , a partial deduction for B in P T , from tb 
Label B marked 

Let A b name the set of body literals in P rB 
For each predicate q appearing in an atom in Ag 
Let msg q name an msg of all atoms having q 
as predicate symbol in A and A B 
If there is an atom in A having q as predicate 
symbol and it is less general than msg q 
then remove this atom from A 
'If now there is no atom in A having q as 
predicate symbol 

then add msg q to A and label it unmarked 
Endfor 
Endwhile 

Finally, construct the partial deduction P T ' of P T wrt A: 
Replace the definitions of the partially deduced 
predicates by the union of the partial deductions P rB 
for the elements B of A. 

We illustrate the algorithm on our running example. 
Example 4.6 

complete renaming of the reverse program: 
reverse([], L,L). 

reverse([X|Xs],Ys,Zs) *— reverse'(Xs,[X|Ys],Zs). 
reverse'([],L,L). 

reverse'([X|Xs],Ys,Zs) <— reverse'(Xs,[X|Ys],Zs). 

partial deduction for <— reverse([l,2|Xs],[],Zs): 
reverse( [1 ,2] , [] , [2, 1] ) . 

reverse([l,2,X|Xs],[],Zs) <— reverse'(Xs,[X,2,l],Zs). 

partial deduction for <— reverse'(Xs,[X,2,l],Zs): 
reverse'([],[X,2,l],[X,2,l]). 
reverse'([X'|Xs],[X,2,l],Zs) <— 
reverse'(Xs,[X',X,2,l],Zs). 

msg of reverse'(Xs,[X,2,l],Zs) and 

reverse'(Xs,[X',X,2,l],Zs): reverse'(Xs,[X,Y,Z|Ys],Zs) 
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partial deduction for <— reverse^Xs^XjY^lYs^Zs): 
reverse'([],[X,Y,Z|Ys],[X,Y,Z| Ys]). 
reversed [X'|Xs] ,[X,Y,Z| Ys],Zs) +- 
reverse'(Xs,[X',X,Y,Z|Ys],Zs). 

resulting set A: 

{reverse([l,2|Xs],[],Zs),reverse'(Xs,[X,Y,Z|Ys],Zs)} 

resulting partial deduction: 
reverse( [1 ,2] , [] ,[2,1] ) . 

reverse([l,2,X|Xs],[],Zs) <— reverse'(Xs,[X,2,l],Zs). 
reverse^ [] , [X , Y, Z | Ys] , [X , Y ,Z | Ys] ) . 
reverse'^ [X'|Xs] ,[X, Y,Z | Y s] ,Zs) <- 
reverse'(Xs,[X',X,Y,Z|Ys],Zs). 

We can prove the following interesting properties of 
algorithm 4.5. 

Theorem 4.7 Algorithm 4.5 terminates. 

Proof Due to space restrictions, we refer to 
[Martens and De Schreye, 1992]. □ 

Theorem 4.8 Let P be a definite program, A — 
p(ti, . . . ,tn ) be an atom and p' be a predicate symbol 
used as inputs to algorithm 4.5. Let A be the (finite) set 
of atoms and P r ' be the program output by algorithm 4.5. 
Then the following hold: 

• A is independent. 

• For any goal G = «— A 1; . . . ,A m consisting of atoms 
that are instances of atoms in A, P T ' U {G} is A- 
covered. 

Proof 

• We first prove that A is independent. 

From the way A is constructed in the For-loop, it 
is obvious that A cannot contain two atoms with 
the same predicate symbol. Independence of A is 
an immediate consequence of this. 

• To prove the second part of the theorem, let P r * be 
the subprogram of P T ' consisting of the definitions 
of the predicates in P T ' upon which G depends. We 
show that P T * U {G} is A-closed. 

Let A be an atom in A. Then the For-loop in algo- 
rithm 4.5 ensures there is in A a generalisation of 
any body literal in the computed partial deduction 
for A in P T ' . The A-closedness of P r * U {G} now 
follows from the following two facts: 

1. P r ' is a partial deduction of a program ( P T ) wrt 
A. 

2. All atoms in G are instances of atoms in A. 

□ 



Corollary 4.9 Let P be a definite program, A = 
p(ti , . . . ,t n ) be an atom and p' be a predicate symbol 
used as inputs to algorithm 4.5. Let A be the set of 
atoms and P T ' be the program output by algorithm 4.5. 
Let G =«— A\, . . . ,A m be a goal in the language under- 
lying P, consisting of atoms that are instances of atoms 
in A. Then the following hold: 

• P/UfG} has an SLD-refutation with computed an- 
swer 0 iff P U {G} does. 

• P r ' U {G} has a finitely failed SLD-tree iff P U {G} 
does. 

Proof The corollary is an immediate consequence of 
lemma 4.3 and theorems 2.1 and 4.8. □ 

Proposition 4.10 Let P be a definite program and A 
be an atom used as inputs to algorithm 4.5. Let A be 
the set of atoms output by algorithm 4.5. Then A G A. 

Proof A is put into A in the initialisation phase. From 
definition 4.4, it follows that no clause in P r contains a 
condition literal with the same predicate symbol as A. 
Therefore, A will never be removed from A. □ 

This proposition ensures us that algorithm 4.5 does 
not suffer from the kind of specialisation loss mentioned 
in section 2.1: The definition of the predicate which ap- 
pears in the query <— A, used as starting input for the 
partial deduction, will indeed be replaced by a partial 
deduction for A in P in the program output by the al- 
gorithm. 

Finally, we have: 

Corollary 4.11 Let P be a definite program, A = 
p(ti,...,t n ) be an atom and p' be a predicate symbol 
used as inputs to algorithm 4.5. Let P T ' be the program 
output by algorithm 4.5. Then the following hold for any 
instance A' of A: 

• P T ' U {<— A'} has an SLD-refutation with computed 
answer 0 iff P U {<— A'} does. 

• P T ' U {<— A'} has a finitely failed SLD-tree iff P U 
{<— A'} does. 

Proof The corollary immediately follows from corol- 
lary 4.9 and proposition 4.10. □ 

Theorem 4.7 and corollary 4.11 are the most impor- 
tant results of this paper. In words, their contents can 
be stated as follows. Given a program and a goal, algo- 
rithm 4.5 produces a program which provides the same 
answers as the original program to the given query and 
any instances of it. Moreover, computing this (hopefully 
more efficient) program terminates in all cases. 
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5 Discussion and Conclusion 

In [Lloyd and Shepherdson, 1991], important criteria en- 
suring soundness and completeness of partial deduc- 
tion are introduced. In the present paper, we started 
from a recently proposed strategy for finite unfolding 
([Bruynooghe et al , 1991a]) and developed a procedure 
for partial deduction of definite logic programs. We 
proved this procedure produces programs satisfying the 
mentioned criteria and, in an important sense, showing 
the desired specialisation. Moreover, the algorithm ter- 
minates on all definite programs and goals. 

The unfolding method as it is presented in section 3 
was proposed in [Bruynooghe et al, 1991a], but appears 
here for the first time in this detailed and automati- 
sable form, specialised for object level programs. It 
tries to maximise unfolding while retaining termination. 
We know, however, of two classes of programs where 
the first goal is not achieved. First, meta programs 
require a somewhat more refined control of unfolding. 
This issue is addressed in [Bruynooghe et al, 1991a]. 
We refer the interested reader to that paper (or to 
[Bruynooghe et al, 1991b]) for further comments on this 
topic. Second, (datalog) programs where the information 
contained in constants appearing in the program text 
plays an important role, are not treated in a satisfactory 
way. Further research is necessary to improve the unfold- 
ing in this case. (A combination of our rule with the Ry 
computation rule seems promising.) As far as the used 
unfolding strategy does maximise unfolding, however, it 
probably diminishes or eliminates the need for dynamic 
renaming as proposed in [Benkerimi and Hill, 1989]. 

We now compare briefly algorithm 4.5 with the par- 
tial deduction procedure with static renaming presented 
in [Benkerimi and Lloyd, 1990]. First, we showed above 
that our procedure terminates for all definite programs 
and queries while the latter does not. The culprit 
of this difference in behaviour is (apart from the un- 
folding strategy used) the way in which msg’s are 
taken. We do this predicatewise, while the authors of 
[Benkerimi and Lloyd, 1990] only take an msg when this 
is necessary to keep A independent. This may keep more 
specialisation (though only for predicates different from 
the one in the starting goal), but causes non-termination 
whenever an infinite, independent set A is generated (as 
illustrated in example 4.1). Observe, moreover, that we 
have kept a clear separation between the issues of control 
of unfolding and of ensuring soundness and complete- 
ness. The use of algorithm 3.6 — or further refinements 
(see above) — guarantees that all sensible unfolding — 
and therefore specialisation — is obtained. The way in 
which algorithm 4.5, in addition, ensures soundness and 
completeness, takes care that none of the obtained spe- 
cialisation is undone. Therefore, it does not seem worth- 
while to consider more than one msg per predicate. Note 
that one can even consider restricting the partial deduc- 



tion to the predicate in the starting query and simply 
retaining the original clauses for all other predicates in 
the result program. This can perhaps be formalised as a 
partial deduction where only a 1-step trivial unfolding is 
performed for these predicates. 

Next, the method in [Benkerimi and Lloyd, 1990] is 
formulated in a somewhat more general framework than 
the one presented here. A reformulation of the latter 
incorporating the concept of L-selectability and allow- 
ing more than one literal in the starting query seems 
straightforward. However, a generalisation to normal 
programs and queries and SLDNF-resolution while re- 
taining the termination property, is not immediate. In 
e.g. [Benkerimi and Lloyd, 1990], it is proposed that 
during unfolding, negated calls can be executed when 
ground and remain in the resultant when non-ground. 
This of course jeopardises termination, since termina- 
tion of “ordinary” ground logic program execution is not 
guaranteed in general. One solution is restricting at- 
tention to specific subclasses of programs (e.g. acyclic 
or acceptable programs, see [Apt and Bezem, 1990], 
[Apt and Pedreschi, 1990]). Another might be to use an 
adapted version of our unfolding criterion in the evalu- 
ation of the ground negative call, and to keep the lat- 
ter one in the resultant whenever the SLD(NF)-tree pro- 
duced is not a complete one. Yet a third way might be 
offered by the use of more powerful techniques related to 
constructive negation (see [Chan and Wallace, 1989]). 

Finally, [Gallagher and Bruynooghe, 1990] presents 
another approach to partial deduction focusing both on 
soundness and completeness and on control of unfolding. 
The main difference is the control of unfolding by a con- 
dition based on maximal deterministic paths, where our 
approach is based on maximal data consumption, moni- 
tored through well-founded measures. 
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Abstract 

We extend the notions ’recurrency' and ’acceptability’ 
of a logic program, which were respectively defined in 
the work of M. Bezem and the work of K. R. Apt and 
D. Pedreschi, and which were shown to be equivalent 
to respectively termination under an arbitrary computa- 
tion rule and termination under the Prolog computation 
rule. We show that these equivalences still hold for the 
extended definitions. The main idea is that instead of 
measuring ground instances of atoms, all possible calls 
are measured (which are not necessarily ground). By 
doing so, a more practical technique is obtained, in the 
sense that ’’more natural” measures can be used, which 
can easily be found automatically. 

1 Introduction 

In the last few years, a strong research effort in the field 
of logic programming has addressed the issue of termina- 
tion. From the more theoretical point of view, the results 
obtained by Vasak and Potter [1986], Baudinet [1988], 
Bezem [1989], Cavedon [1989], Apt and Pedreschi [1990], 
and Bossi et al. [1991] have provided several frameworks 
and basic techniques to formulate and solve questions 
regarding the termination of logic programs in semanti- 
cally clear and general terms. Other researchers, such 
as Ullman and Van Gelder [1988], Pliimer [1990], Wang 
and Shyamasundar [1990], Verschaetse and De Schreye 
[1991], and Sohn and Van Gelder [1991] have provided 
practical and automatable techniques for proving the ter- 
mination of logic programs with respect to certain classes 
of queries at compile time. 

In this paper, we propose an extension of the theo- 
retical frameworks for the characterisation of terminat- 
ing programs and queries proposed in [Bezem 1989] and 
[Apt and Pedreschi 1990], The framework does not only 
provide slightly more general results, but also increases 
the practicality of the techniques in view of automation. 

'Supported by the National Fund for Scientific Research. 

t Supported by ESPRIT BRA COMPULOG project nr. 3012. 



Let us recall some definitions from [Bezem 1989] in 
order to explain our motivation and the intuition behind 
our approach. 

Definition 1.1 (see [Bezem 1989]; Definition 2.1) A level 
mapping for a definite logic program P is a mapping 
M : Bp —* IN. 

Definition 1.2 (see [Bezem 1989]; Definition 2.2) A 
definite logic program P is recurrent if there exists a 
level mapping |.|, such that for each ground instance 
of a clause in P, |A| > |f?,|, for each 

i — 1 , . . . , n. 

Definition 1.3 (see [Bezem 1989]; Definition 2.7) A defi- 
nite logic program P is terminating if all SLD-derivations 
for ( P , «— G), where G is a ground goal, are finite. 

One of the basic results of [Bezem 1989] is that a pro- 
gram is recurrent if and only if it is terminating. Al- 
though this result is very interesting from a theoretical 
perspective, it is not a very practical one in terms of au- 
tomated detection of terminating programs and queries. 
The problem comes from the fact that the definition of 
recurrency requires that the level mapping ’’compares” 
the head of each ground instance of a clause with ev- 
ery corresponding atom in the body and imposes a de- 
crease. Intuitively, what would be preferable is to obtain 
a well-founding based on a measure function (or level 
mapping), which only decreases on each recursive call to 
a same predicate. This corresponds better to our intu- 
ition, since nontermination (for pure logic programs) can 
only be caused by infinite recursion. 

As we stated above, the problem is not merely related 
to our intuition on the cause of nontermination, but more 
importantly to the practicality of level mappings. Con- 
sider the following example. 

Example 1.4 

p(Q)- 

p([ff|T]) - ,([/f|T]),p(T). 

?([])• 

,([ff|T]) - t (T). 
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It is not possible to take as level mapping a function 
that maps ground instances p(x) and q(x) to the same 
level, namely list-length(x) if a: is a ground list, and 0 
otherwise. Instead, the definition of recurrency obliges 
us to take a level mapping that has a ’’unnatural” offset 
(1 in this case). 

|p(z)j = list-length{ x) + 1 
|g(a;)| = list-length( x). 

In a naive attempt to improve on the results of 
[Bezem 1989], one could try to start from an adapted 
definition for a recurrent program, in which the relation 
14 > | S,- 1 would only be required if A and B, are atoms 
with the same predicate symbol. However, the equiv- 
alence with termination would immediately be lost — 
even for programs having only direct recursion — as the 
following example shows. 

Example 1.5 

append(D, L, L). 

appead([Jr|S],T,[ir|ff]) «- append (S,T,U). 
p([S|T]) - append(X , Y, Z), p(T). 

An ’’extended” notion of recurrency, where the level 
mapping only relates the measure of ground instances of 
the recursive calls, would hold with respect to the level 
mapping: 

jp(«)| = list-length(x) 

|append{s, y, z)\ — list-length(x ) . 

On the other hand, the program is clearly not terminat- 
ing — if it would be terminating, then we would have 
shown that append/3 terminates for a call with all three 
arguments free. 

The heart of the problem is that in the definition of 
recurrency, the level mapping is used for two quite dis- 
tinct purposes at the same time. First, the level mapping 
does ensure that on each derivation step, the measure of 
a recursive descending call is smaller than the measure of 
the ancestor call (or at least: for each ground instance of 
such a derivation step). Second, since we are only given 
that the top level goal is ground (or, in a more general 
version of the theorem, bounded) — but we have no in- 
formation on the instantiation of any of the descending 
calls — the level mapping is also used to ensure that we 
have some upper limit on the measures for the calls of 
the (independent) recursive subcomputation evoked by 
the original caU. In the current definition, this is done 
by imposing that the level also decreases between a call 
and its descendants that are not related through recur- 
sion. 

The way in which we address the problem here, differs 
from the approach in [Bezem 1989] in three ways: 



1. We first compute all atoms that can occur as calls 
during any SLD-derivation for the top-level goal(s) 
under consideration. 

2. We use an extended notion of level mapping, defined 
on all such atoms — not only the ground ones. 

3. We have an adapted definition of recurrency, with 
as its most important features: 

(a) the condition |A| > |Z?,| is not imposed on 
ground instances of a clause, but instead, on 
each instance obtained after unification with a 
(possible) call, 

(b) the decrease |A| > |2?;| is only imposed if A 
and Bi are calls to the same predicate symbol. 
(This is for direct recursion — in the context of 
indirect recursion, the condition is more com- 
plex). 

One of the side effects of taking this approach is 
that there is no more necessity to start the analysis 
for one ground or bounded goal. The technique works 
equally well when we start from any general set of 
atoms. The additional advantage that we gain here is 
that in practice, we are usually interested in the ter- 
mination properties of a program with respect to some 
call pattern. Such call patterns can always be speci- 
fied in terms of abstract properties of the arguments in 
the goals through mode information, type information 
or combined (rigid or integrated) mode and type infor- 
mation (see [Janssens and Bruynooghe 1990]). Any such 
call pattern corresponds to a set of atoms in the con- 
crete domain, and can therefore be analysed with our 
approach. 

The paper is organised as follows. In the next sec- 
tion we extend the equivalence theorem of [Bezem 1989] 
in the way described above. In section 3 we take 
a completely similar approach to extend results of 
[Apt and Pedreschi 1990] on left termination. In sec- 
tion 4, we illustrate the improved practicality of 
the new framework. We also indicate how some 
simple extensions are likely to provide full theoreti- 
cal support for the automated technique proposed in 
[Verschaetse and De Schreye 1991]. 

All proofs have been omitted from the paper. They 
can be found in [De Schreye and Verschaetse 1992]. 

2 Recurrency with respect to a 
set of atoms 

We first introduce some conventions and recall some 
basic terminology. Throughout the paper, P will de- 
note a definite logic program. The extended Her- 
brand Universe, Up, and the extended Herbrand Base, 
Bp, associated to a program P, were introduced in 
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[Falaschi et al. 1989]. They are defined as follows. Let 
Termp and Atomp denote the sets of respectively all 
terms and all atoms that can be constructed from the 
alphabet underlying to P. The variant relation, de- 
noted «, defines an equivalence. Up and B§ are re- 
spectively the quotient sets Termp I ~ and Atomp / ~. 
For any term t (or atom A), we denote its class in Up 
{Bp) as f (A). There is a natural partial order on Up 
(and Bp), defined as: s < t if there exist represen- 
tants s' of s and i' of i in Termp and a substitution 
9, such that s' = t'6. Throughout the paper, S will de- 
note a subset of Bf. We define its closure under < as: 
S c = {A G Bf | 35 G 5 : A < B}. 

Definition 2.1 P is terminating with respect to S if for 
any representant A' of any element A of 5, every SLD- 
tree for (P, <— A') is finite. 

Denoting the classical notion of a Herbrand Base (of 
ground atoms) over P as Bp, then with the terminology 
of [Bezem 1989] we have: 

Lemma 2.2 P is terminating if and only if it is termi- 
nating with respect to Bp. 

Lemma 2.3 If all SLD-derivations for (P, <— A) are finite, 
and 9 is any substitution, then all SLD-derivations for 
(P, <— A6) are finite. 

From lemma 2.3 it follows that in order to verify def- 
inition 2.1 for a set S C Bp, it suffices to verify the 
finiteness of the SLD-trees for (P, <— A) for only one rep- 
resentant of each element in A. It also follows that P is 
terminating with respect to a set S C Bp if and only if it 
is terminating with respect to S c . In fact, given that P 
terminates with respect to S , it will in general be termi- 
nating with respect to a larger set of atoms than those in 
S c . It is clear that if all SLD-trees for (P, <— A) are finite, 
and if H*—Bi, . . . , B n is a clause in P, such that A and 
H unify, then all SLD-trees for (P, *—B,9), i = 1, . . . , n, 
where 6 = mgu(A, H), are finite. We can characterise 
the complete set of terminating atoms associated to a 
given set S as follows. 

Definition 2.4 For any T C Bp, define Tf l {T) = 
{B{9 G Bp | A' is a representant of A G T, H 
*— B\,...,B n is a clause in P, 6 = mgu{A ' , H) and 
1 < i < n}. 

Denote Hs = {T G 2 B r | S c C T}. Hs is a complete 
lattice with bottom element S c . 

Definition 2.5 R S :H S ^H S : R S {T) = TU Tf\T) c . 

Lemma 2.6 R$ is continuous. 

As a result, the least fix-point for R s is R s 



Lemma 2.7 P is terminating with respect to S if and 
only if P is terminating with respect to Usfo*. 

As a result of our construction (in fact: as the very 
purpose of it), Rs]u contains every call in every SLD- 
tree for any atomic goal of 5. Formally: 

Proposition 2.8 Let call(P, S) denote the set of all 
atoms B, such that B is the subgoal selected by the 
computation rule in some goal of some SLD-tree for a 
pair (P, <— A), with A the representant of an element of 
5. Then, call{P, S) C R s tw. 

We now introduce a variant of the definition of a level 
mapping, where the mapping is defined on equi valence 
classes of calls. 

Definition 2.9 (level mapping) 

A level mapping with respect to a set S C Bp is a function 
|.| : Rs]oj — ► IN . A level mapping |.| is called rigid 
if for all A G and for any substitution 9, |A| = 

\A9\, i.e. the level of an atom remains invariant under 
substitution. 

With slight abuse of notation, we will often write |A|, 
where A is a representant of A G Bf. The associated 
notion of recurrency with respect to 5 will not be de- 
fined on ground instances of clauses, but instead on all 
instances (H <— P l5 . . . , B n )9 of clauses H *—B i, . . . , B n of 
P, such that 9 = mgu{A, H ), where A is a representant 
of an element of Rs^oj. The definition in [Bezem 1989] 
does not explicitly impose a decrease of the level map- 
ping at each inference step. The level mapping’s values 
should only decrease for ground instances of clauses. By 
considering more general instances of clauses (as above), 
we can explicitly impose a decrease of the level mapping’s 
value during (recursive) inference steps. As a result, the 
adapted level mapping no longer needs to perform dif- 
ferent functionalities at once, and we can concentrate on 
the real structure of the recursion. 

Now, concerning this recursive structure, there are a 
number of different possibilities for a new definition of 
recurrency, depending on how we aim to deal with indi- 
rect recursion. In order not to confuse all issues involved, 
we first provide a definition for programs P, relying only 
on direct recursion. 

Definition 2.10 A (directly recursive) program P is re- 
current with respect to S, if there exists a level mapping 
| . | with respect to S, such that: 

• for any A' representant of A G Rs T w > 

• for any clause P<— Bi,...,B n in P, such that 
mgu(A' , H) = 9 exists, 

• for any atom P«, 1 < i < n, with the same predicate 
symbol as H: |A'| > \Bi9\. 
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What is expressed in this definition is that for any two 
recursively descending calls with a same predicate sym- 
bol in any SLD-tree for (representants of) atoms in S, 
the level mapping’s value should decrease. This condi- 
tion has the advantage of being perfectly natural and 
therefore, of being easy to verify in an automated way. 
The only possible problem in view of automation is that 
it requires the computation of £5 jo;. But, this problem 
is precisely the type of problem that can easily be solved 
(or approximated) through abstract interpretation (see 
section 4). 

In the presence of indirect recursion, we need a more 
complex definition, that deals with the problem that a re- 
cursive call with a same predicate symbol as an ancestor 
call may only appear after a finite number of inference 
steps (instead of in the body of the particular instance 
of the applied clause). This can be done in several ways. 
We first provide a definition related to the concept of a 
resultant of a finite (incomplete) derivation. Based on 
this definition, we prove the equivalence with termina- 
tion. After that, we provide a more practical condition, 
of which definition 2.10 is an obvious instance for the 
case of direct recursion. 

First, we need some additional terminology. 

Definition 2.11 Let A be an atom and (Go = <— A), 
Gj, G 2 ,..., G n , (n > 0), a finite, incomplete SLD- 
derivation for (P, *— A). Let 6\,...,6 n be the cor- 
responding sequence of substitutions, and let 9 = 
9 X 9 2 ■ • - 9 n and G n — <— Bx, . . . , B m . With the ter- 
minology of [Lloyd and Shepherdson 1991] we say that 
A9<—B i, . . . , B m is the resultant of the derivation. 

Definition 2.12 A resultant A$*—Bi,.,.,B m of a 
derivation (G 0 = <— A), G lt . . . , G n , is a recursive resul- 
tant for A if there exists i (1 < i < m), such that B t has 
the same predicate symbol as A. 

Definition 2.13 (recurrency wrt a set of atoms) 

A program P is recurrent with respect to 5, if there exists 
a level mapping, |.|, with respect to 5, such that: 

• for any A' representant of A € i? 5 tw, 

• for any recursive resultant A' 9*— B lt . . . , B m , for A', 

• for any atom £,, 1 < i < m, with the same predicate 
symbol as A': (A'| > |£,j. 

Proposition 2.14 If P is recurrent with respect to 5, 
then P terminates with respect to 5. 

Just as in the framework of Bezem, the converse state- 
ment holds as well. 

Theorem 2.15 

P is recurrent with respect to 5 if and only if it is ter- 
minating with respect to S. 



One of the nice consequences of this result is that we 
can now relate the concept of a recurrent program in the 
sense of [Bezem 1989] to recurrency with respect to a set 
of (ground) atoms. 

Corollary 2.16 P is recurrent if and only if it is recur- 
rent with respect to Bp. 

It may seem surprising to the reader that two appar- 
ently very different notions such as recurrency and recur- 
rency with respect to Bp coincide. It is our experience 
from our work in termination of unfolding in the context 
of partial deduction ([Bruynooghe et al. 1991]) that this 
is not unusual. The reason is that conditions occurring 
in these contexts require the ’’existence” of some well- 
founded measure. The specific properties of such mea- 
sures can take totally different form without loosing the 
termination property. The only real difference lies in the 
practicality. 

We conclude the section by introducing a condition 
that implies definition 2.13. This condition has the ad- 
vantage over definition 2.13 that it does not rely on the 
verification of some property for each of a potentially 
infinite number of recursive resultants. Instead it only 
requires such a verification for a finite number of clauses, 
which can be characterised through the minimal, cyclic 
collections of P . 

Definition 2.17 (minimal cyclic collection) 

A minimal cyclic collection of P is a finite sequence of 
clauses of P : 

Ai - £1,...^', ...,#, 

A m - BT, ...,£- 

such that: 

• for each pair (i ^ j), the heads of the clauses, A x 
and Aj, are atoms with distinct predicate symbols, 

• A, and A] have the same predicate symbols (1 < i < 

m), 

• A] n+ j has the same predicate symbol as A\. 

Only a finite number of minimal cyclic collections exists. 
They can easily be characterised and computed from the 
predicate dependency graph for P. 

Proposition 2.18 

Let S C Bf and |.| a rigid level mapping with respect to 
S, such that for any minimal cyclic collection of P (after 
standardizing apart), 

Ai - £!,..., A', ...,£>, 

Am - BT Cl.- 
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and for any Ai,...,A m € •KsT^t with A ", . . . , A'^ as 
their respective representants, and 9; = mgu(A t) A-'), 
(1 < i < m), the following condition holds: 



\A'A\ 


> 


WA.-1I 


> K, 



$ 

I4f| > |iC+X^m|. 

Then, P is recurrent with respect to S. 

The conditions in proposition 2.18 seem rather unnat- 
ural at first sight and need some clarification. First, ob- 
serve that in the case of direct recursion — except for the 
rigidity of the level mapping — the conditions coincide 
with those of definition 2 . 10 . 

For the case of indirect recursion, the conditions that 
one would intuitively expect, are that for each minimal 
cyclic collection 

M B\,...,A 2 ,...,B^ 

and each A" representant of A\ € Rs t w , such that 9 — 
mgu(A '( , A\) and#{ = mgu(A\ , A;), 1 < i < m, exist and 
are consistent, we have 

Kl > K+.w, •••«„!. 

The problem is that such a condition is not correct. Con- 
sider the clauses: 

P (a,[.|X]) - p(6,X). (oil) 
p(b,X) 9 (a,[.|X]). (cJ2) 

9 ( 6 , X) <- p(a, [.|X]). (c/3) 

9 (o, [_|X]) - 9 ( 6 , X). (c/4) 

There are 4 associated minimal collections: (ell), 

(cl2,cl3), (cl3,cl2) and (cl4). Consider for instance 
the derivation «-p(a, [_, _]), p( 6 , [_]), *-g(a, 

[-»-])• 

The problem is caused by resultants associated to 
derivations that start with a clause from one minimal 
cyclic collection — say (cl2) in the collection (cl2,cl3) — 
then shift to applying another collection, (cl4), and only 
after this resume the first collection and apply clause 
(cl3). The head of the third clause, q(b, X), does not 
unify with q(a, [_|X']), and therefore, the condition on 
the cycle (cl2,cl3) can not be applied. 

So, we have to impose the condition in proposition 
2.18. It states that, even if the next call in the traversal 
of a minimal collection (A") is not really related — as 
an instance — to a call we obtained earlier ( A'^_x) , but 
if — through the intermediate computation in another 
minimal collection — the level between these two has 
decreased anyway, then the final conclusion between the 
original call to the collection and the indirectly depend- 
ing one must still hold. We will not discuss the condition 
any further here, but we will return to its practicality in 
section 4. 



3 Acceptability with respect to 
a set of atoms 

All definitions and propositions from the previous sec- 
tion can be specialised for the Prolog computation rule. 
Following [Apt and Pedreschi 1990], we call an SLD- 
derivation that uses Prolog’s left-to-right computation 
rule, an LD- derivation. 

Definition 3.1 (left termination wrt S) Let S be 
a subset of Bp. A program P is left-terminating with 
respect to 5 if for any representant A of any element of 
5, every LD-derivation is finite. 

Recall definitions 2.4 and 2.5. The motivation behind 
these definitions was finding an overestimation of all calls 
that are possible in any SLD-derivation using an arbi- 
trary computation rule. The fact that no fixed compu- 
tation rule is used, forces us to take the closure under all 
possible instantiations in definition 2.5, and hence R$ 
contains in general a lot more calls than can really occur 
when a particular computation rule is chosen. 

In this section, we focus our analysis on computations 
that use Prolog’s left-to-right computation rule. There- 
fore, adapted definitions of the Tf l and Rs functions are 
needed. 

Definition 3.2 For any T C Bp, define: Vp l {T) — 
{Bi9ai ■ • • cr,_i € Bp | A' is a representant of A 6 T, 
H <— 2?X) . • . , B n is a clause in P, 9 = mgu(A' , H), 1 < 
i < n, 3 < 7 i, . . . , <t,_x, such that Vj = 1, . . . , i — 1: <r ; is an 
answer for ( P , *—Bj9cri • • -(Tj-x)}. 

The answer substitutions cr, are computed using LD- 
resolution. Let H l $ T denote (T 6 2 b p \ S C T}. 

Definition 3.3 R l f r : H l f r — ft ! s ~ r : R l f r {T) = T U 
Vp\T) 

In a completely analogous way as in the previous sec- 
tion, we find that R l g T is continuous. Hence, the least fix 
point R l f r ]u> contains all atoms that can possibly occur 
as a call when P is executed under the Prolog computa- 
tion rule, and when a representant of an element from 5 
is used as query. 

Level mappings are now defined on R l <f T . Recursive re- 
sultants are constructed using the left-to-right computa- 
tion rule. This allows us to consider only recursive resul- 
tants of the formp(sx, . . . , s„)<— p(fx, . . . , t„), B 2 , . . . , B m . 
The analogue of recurrency with respect to a set S of 
atoms, is acceptability with respect to S. 

Definition 3.4 (acceptability wrt a set of atoms) 

A program P is acceptable with respect to S , 
if there exists a level mapping |.| with respect 
to 5, such that for any p(s 1 , . . . , s n ), represen- 
tant of an element in R^^ui, and for any recur- 
sive resultant p(si, . . . , s n )9<—p(ti, . . . ,t n ), B 2 , ■ ■ . , B m : 
|p(sx,...,s n )| > |p(tx,...,t„)|. 
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Theorem 3.5 

P is acceptable with respect to 5 if and only if it is left- 
terminating with respect to S. 

As in section 2, we provide a more practical, sufficient 
condition. The result is completely analogous to propo- 
sition 2.18. 

Proposition 3.6 

Let 5 C Bp and |.| a level mapping with respect to S, 
such that for any minimal cyclic collection of P (after 



standardizing apart), 


Ay 


- Si, 


-^m 


, Dm Dm Al pm 

*— a j , . . . , B im , j, . . . , a nn 



and for any Ay , . . . , A m € R l s r t^, with A " , . . . , A " 
as their respective representants, and with 9 } = 

mgu{Aj, A") (1 < j < m ) and a 3 k is a computed an- 
swer substitution for ( P , <—B 3 k 6jcr{ • • • cr£_ x ) (1 < k < ij ), 
the following condition holds: 

| |4,0,a‘ •••<,!, | > K| 1 

1 I 

Then, P is acceptable with respect to S. 



4 Practicality and automation 

A fully automated technique needs to address the follow- 
ing issues: 

• safe approximations of i2sftu and R ! f T ]ui must be 
computed, 

• precise and natural level mappings are needed, and 

• the conditions in propositions 2.18 and 3.6 must be 
automatically verifiable. 

For left termination, there is one extra issue: 

• some properties of the answer substitutions for the 
atoms in Rjf'lcu are needed; in particular, after ap- 
plication of a computed answer substitution we want 
an estimation of the relationship between the sizes 
of the arguments of the atoms in R l $ ri \uj. 

Concerning the first issue, observe that in practice, the 
sets of atoms 5 in the framework are likely to be specified 
in terms of call patterns over some abstract domain. The 
framework contains no implicit restriction on the kind of 
abstractions that are used for this purpose. They could 
be either expressing mode or type information, or even 
combined mode and type information — as in the rigid 



or integrated types of [Janssens and Bruynooghe 1990]. 
Abstract interpretation can be applied to automati- 
cally infer a safe approximation of Rs]u> or R'f'lu (see 
[Janssens and Bruynooghe 1990]). 

Automated techniques for proving termination use 
various types of norms. A norm is a mapping ||.|| : Up 
12V. Several examples of norms can be found in the lit- 
erature. When dealing with lists, it is often appropriate 
to use list-length , which gives the depth of the rightmost 
branch in the tree representation of the term. A more 
general norm is term- size, which counts the number of 
function symbols in a term. Another frequently used 
norm is term-depth , which gives the maximum depth of 
(the tree representation of) a term. 

However, we restrict ourselves to semi-linear norms, 
which were defined in [Bossi et al. 1991], 

Definition 4.1 (semi-linear norm) 

A norm ||.|j is semi-linear if it satisfies the folowing con- 
ditions: 

• ||V|| = 0 if V is a variable, and 

• !l/(*i, •••.*»)!! = c +ll^. 11 + wherec € IN, 

1 < ii < • ■ • < i m < n and c, iy , . . . , i m depend only 
on f/n. 

Examples of semi-linear norms are list-length and 
term-size. 

As was pointed out in [Bossi et al. 1991], proving ter- 
mination is significantly facilitated if the norm of a term 
remains invariant under substitution. Such terms are 
called rigid. 

Definition 4.2 (rigid term; see [Bossi et al. 1991]) 
Let |].|| be a (semi- linear) norm. A term t is rigid with 
respect to ||.j| if for any substitution a, ||f<7|| = ||!||. 

Rigidity is a generalisation of groundness; by using this 
concept it is possible to avoid restricting the definition of 
a norm to ground terms only, a restriction that is often 
found in the literature. 

Given a semi-linear norm and a set of atoms 5, a very 
natural level mapping with respect to S can be associated 
to them. 

Definition 4.3 (natural level mapping) 

Given is a semi-linear norm ||.|| and a set of atoms S. 
|.| t , the natural level mapping induced by S, is defined 
as follows: Vp(t x , . . . , t n ) € Rs T Cc ’ : 

M<. <„)L« = Sis/IWI, if r ^ 0 

= 0 otherwise, 

with I = {i | Vp(u x , . . . , u n ) € jRs'Iw ’• u \ rigid}. 

Let us illustrate the practicality of such mappings — 
and of the framework itself — with some examples. 
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Example 4.4 

Reconsider example 1.4 from the introduction. Assume 
that 5 = {p(z) | a: is a nil-terminated list}. Let ||.||, be 
the list-length norm. The argument positions of all atoms 
in R $ are rigid under this norm. So, |p(*)| nai = ||*||j 
and |^f(sc)| nat = ||s|| { . The program is directly recursive, 
so that it suffices to verify the conditions of definition 
2 . 10 . 

For the clause p([H\T])<—q([H\T]),p(T) and for each 
call p(a;) € i2sT w > 6 = mgu(x, [H\T]), we have 

|p(a:)| not > |p(T)0| not . By the same argument, the con- 
dition on the clause 5([if|T])«— q(T) holds as well. Thus, 
the program is recurrent with respect to S under the 
natural, list-length level mapping with respect to 5. 

As a second example, we take a program with indirect 
recursion. It defines some form of well-formed expres- 
sions built from integers and the function symbols +/ 2, 
*/2 and — /l. 

Example 4.5 



e(X + Y) 


- f(X),t(Y). 


(cl 1) 


e(X) 


- /(*)• 


(c/2) 


f(X*Y) 


- g(X).f(Y). 


(c/3) 


f(X) 


- g(X). 


(c/4) 


s(-PO) 


- e(X). 


(c/5) 


g(X) 


<— integer(X). 


(c/6) 



The obvious choice for a level mapping for this program is 
term-size. However, the program is not recurrent in the 
sense of [Bezem 1989] with respect to this norm. Since it 
is clearly terminating, a level mapping exists. The most 
natural mapping (in the sense of [Bezem 1989]) we were 
able to come up with is: 

|e(s)| = 3 x term-size(x) + 2 
l/(*)l = 3 x term-size(x) + 1 
|p(s)| = 3 x term-size(x). 

In the context of our framework, consider the set 5 = 
{e(s) | x is ground}. Through abstract interpretation, 
we can find that C Bp. 

Let ||.|| { be the term-size norm. Again, the argument 
positions of all atoms in i?sTu; are rigid (even ground) un- 
der this norm. Thus, |e(*)| not = ||*|| t , |/(*)| na , = ||as|| t 
and |p(*)| not = 11*11,. The program contains essentially 1 
6 minimal, cyclic collections: (ell), (cl3), (ell, cl3, cl5 ), 
(ell, cl4, cl5 ), (cl2, cl3, cl5 ), (cl2, cl4, cl5 ). 

Let us consider, as an example, the third collection: 

e(X+Y) - f(X),e(Y). 
f(X'-Y') - g(X'),f(Y'). 
g(-(X“)) - e(X"). 

’Since collections are sequences of clauses, cyclic permutations 
should be considered as well. 



Assume that e(x), f(y ) and g(z) are any atoms with 
ground terms x , y and z , and that: 

Ox = mgu(e(x), e(X 4- T)) 
e, = mgu(f(y),f(X'*Y')) 

0, = mgv(g(z),g(-(X''))). 

Also assume that |/(A r )0i| > |/(p)| and \g(X')0 2 \ > 
|p(z)|. We then have |e(a:)j > |/(X)0i| > |/(y)| > 
\g(X’)0 3 \ > |p(z)| > |e(A’")0 3 | ! SO that |e(x)| > 

|e(X")^ 3 |, and the conditions of proposition 2.18 (for the 
third cycle) are fulfilled. All other cycles can be verified 
in a similar way. The conclusion is that the program is 
recurrent with respect to 5 and the very natural term- 
size level mapping. 

In the context of left termination, definition 4.3 can be 
adapted to produce equally natural level mappings with 
respect to a set 5. Obviously, should be replaced 

by Rg^u. In the context of left termination there is 
an extra issue, namely, (an approximation of) the set of 
possible answer substitutions for an atom is needed. The 
next example illustrates how this is handled. 

Example 4.6 

p(O.D)- 

P ([V|T],[G|S]) - d(G,[H\T},V),p(U,S). 
d{H,[H\T},T). 

d(G,[X\T],{H]U]) - d(G,T,U). 

Assume that S = {p(x, y) j * is a nil-terminated list and 
y is free}. Notice that Rs]oj contains the set {p(x,y) | x 
and y are free variables}. We are not able to define a level 
mapping on Hytcu that can be used to prove recurrency 
with respect to S. This is not surprising, since P is not 
terminating with respect to S. 

However, program P is left terminating with respect 
to S. We prove this by showing that P is accept- 
able with respect to 5. The set is the union 

of (p(*,p) | a: is a nil- terminated list and y is free} 
and {d{x,y,z) | x and z are free variables and y is a 
nil-terminated list}. This can be found by using ab- 
stract interpretation. Since there is only direct recur- 
sion in program P, it suffices to show that: (1) for 
any p{x,y) € P'T'K |p(*,2/)| > \p{U, S)0o|, where 
0 = mgu(p(x,y),p([H\T],[G\S])) and a is a computed 
answer substitution for (P, «— d(G, [H\T], U)6), and (2) 
for any d(x,y,z) € R l f T ]uj, \d(x,y,z)\ > \d(G,T,U)0\, 
where 6 = mgu(d(x,y, z), d(G, [H\T], [H\U])). 

Now, in practice, the statement ’V is a computed an- 
swer substitution for (P, <— d(G,[H\T],U)6) n can be 
replaced by ”||[J/|T]0cr|| { = ||P0 ct|| ( + 1”. This latter 
statement is a so-called linear size relation, which ex- 
presses a relation between the norms of the arguments 
of the atoms in the success set of the program. Alterna- 
tively, it can also be interpreted as a (non-Herbrand) 
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model of the program. For more details we refer to 
[Verschaetse and De Schreye 1992], where we describe 
an automated technique for deriving linear size relations. 

By taking this information into account, and by taking 
| p(x, j/)| = ||k|| ( for any p{x,y) £ R l f r ]u) — notice that x 
is rigid with respect to ||.||, — we find: |p(c,j/)| = ||«||, = 

IKiWII, = IWiWI, = WM, + i > \\vec\\, = 
\p(u,s)e<7\. 

The second inequahty, \d{x,y,z)\ > \d{G,T,U)6\ , is 
more easy to prove. This time, the list-length of the 
second argument can be taken as level mapping. Since 
both inequalities hold, we can conclude that the program 
is acceptable with respect to the set of atoms that is 
considered. 

Automatic verification of the conditions for recurrency 
and acceptability is handled by reformulating them into 
a problem of checking the solvability of a linear system of 
inequalities. This part of the work is described in more 
detail in [De Schreye and Verschaetse 1992]. 
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Abstract 

We present an efficient technique for the automatic genera- 
tion of termination proofs for concurrent logic programs, 
taking Guarded Horn Clauses (GHC) as an example. In con- 
trast to Prolog's strict left to right order of evaluation, termi- 
nation proofs for concurrent languages are complicated by a 
more sophisticated mechanism of subgoal selection. We in- 
troduce the notion of directed GHC programs and show that 
for this class of programs goal reductions can be simulated 
by Prolog-like derivations. We give a sufficient criterion for 
directedness. Static program analysis techniques developed 
for Prolog can thus be applied, albeit with some important 
modifications. 

1. Introduction 

With regard to termination it is useful to distinguish between 
two types of software systems or programs: transformational 
and reactive [HAP85]. A transformational system receives 
an input at the beginning of its operation and yields an output 
at the end. If the problem at hand is decidable, termination of 
the process is surely a desirable property. Reactive systems, 
on the other hand, are designed to maintain some interaction 
with their environment. Some of them, for instance op- 
erating systems and database management systems, ideally 
never terminate and do not yield a final result at all. Based on 
the process interpretation of Horn clause logic, concurrent 
logic programming systems have been designed for many 
different applications including reactive systems and trans- 
formational parallel systems. While for some of them termi- 
nation is not a desirable property, for others it is. In this pa- 
per we discuss how automatic termination proofs for concur- 
rent logic programs can be achieved automatically. 

Automatic proof techniques for pure Prolog programs 
have been described in several papers including [ULG88] 
and [PLU90a]. Prolog is characterized by a fixed 
computation rule which always selects the leftmost atom. 
Deterministic subgoal selection and strict left to right order of 
evaluation cannot be assumed for the concurrent languages. 

Static program analysis techniques, which are well estab- 
lished for sequential Prolog, such as abstract interpretation, 



inductive assertions and termination proof techniques, sub- 
stantially depend on the strict left to right order of evaluation 
in most cases and thus cannot easily be applied to concurrent 
languages. Concurrent languages delay subgoals which are 
not sufficiently instantiated. Goals which loop forever when 
evaluated by a Prolog interpreter may deadlock in the context 
of a concurrent language. These phenomena may suggest 
that termination proofs for concurrent logic programs require 
a different approach. This paper, however, shows that 
techniques which have been established for pure Prolog are 
still useful in the context of concurrency. 

Our starting point is the question under which conditions 
reductions of a concurrent logic program can be simulated by 
Prolog-like derivations. We take Guarded Horn Clauses 
(GHC, see [UED86]) as an example, but our results can 
easily be extended to other concurrent logic programming 
languages such as PARLOG, (Flat) Concurrent Prolog or 
FCP(:). Our basic assumptions are the restriction of unifica- 
tion to input matching, nondeterministic subgoal selection 
and resuming of subgoals which are not sufficiently instan- 
tiated. Since we consider all possible derivations, the commit 
operator does not need special attention. 

In general simulation is not possible: if there is a GHC- 
derivation of g' from g, g’ cannot necessarily be derived 
with Prolog's computation rule. 

One could now try to augment simulation by program 
transformation. Let, for instance, P' be derived from P by 
including all clause body permutations. Although P’ may be 
exponentially larger than P, there are still derivations which 
are not captured. 

Example 1.1: 

Program: p <— q,r. q «- s,t. r <— u,v. 

s. v. 

Goal: <— p 

This goal can be reduced to <— t,u by nondeterministic 
subgoal selection, but not by a Prolog like computation, 
even after adding the following clauses: 

p <— r,q. q <— t,s. r <— v,u. 

The reason is that in order to derive <- t,u, the subderiva- 
tions of < — q and <— r have to be interleaved. 
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The question arises whether there is an interesting sub- 
class for which appropriate simulations can be defined. Such 
a class of programs will be discussed in Section 3. The main 
idea is to assume that if a subgoal p may produce some 
output on which evaluation of another subgoal q depends, 
then p is smaller w.r.t. some partial ordering. Whether a 
program maintains such a property, which we will call di- 
rectedness, is undecidable. We will then introduce the 
stronger notion of well-formedness which can be checked 
syntactically. Well-formedness is related to directionality, 
which is discussed in [GRE87]. Well-formedness is suffi- 
cient but not necessary for directedness, and it will turn out 
that quite a lot of nontrivial programs (including for instance 
systolic programs as discussed in [SHA87a] and most of the 
examples given in [TIC91]) fall into this category. In Section 
5 we will demonstrate how termination proof techniques 
which have been established for pure Prolog can be 
generalized such that they apply to well-formed GHC 
programs. 

The rest of this paper is organized as follows. Section 2 
provides basic notions. Section 3 introduces the notion of di- 
rected programs and shows that this property is undecidable. 
It provides the notion of well-formedness and shows that it 
is sufficient for directedness. Section 4 discusses oriented 
and data driven computation and shows that after some sim- 
ple program transformation derivations with directed GHC- 
programs can be simulated by Prolog-like derivations. 
Using the notion of S-models introduced in [FLP89], Sec- 
tions 5 and 6 show how termination proofs can be achieved 
automatically. 

2. Basic Notions 

We use standard notation and terminology of Lloyd [Llo87] 
or Apt [APT90]. Following [APP90] we will say LD-reso- 
lution (LD-derivation, LD-refutation LD-tree) for SLD-reso- 
lution (SLD-derivation, SLD-refutation SLD-tree) with the 
leftmost selection rule characteristic for Prolog. 

Next we define GHC programs following [UED87] and 
[UED88]. 

A GHC program is a set of guarded Horn clauses of the 
following form: 

H <— Gi,...,G m I Bi,...,B n . (m>0,n>0) 

where H, Gi,...,G m and Bi,...,B n are atomic formulas. H 
is called a clause head, the Gj's are called guard goals and 
the Bj's are called body goals. The part of a clause before T 
is called a guard, and the part after T is called a body. One 
predicate, namely '=', is predefined by the language. It uni- 
fies two terms. 

Declaratively, the commitment operator T denotes con- 
junction, and the above guarded Horn clause is read as "H is 



implied by Gi G m and Bi,...,B n ". The operational se- 

mantics of GHC is given by parallel input resolution re- 
stricted by the following two rules: 

Rule of Suspension: 

• Unification invoked directly or indirectly in the guard of a 
clause C called by a goal G (i.e. unification of G with the 
head of C and any unification invoked by solving the 
guard goals of C) cannot instantiate the goal G. 

• Unification invoked directly or indirectly in the body of a 
clause C called by a goal G cannot instantiate the guard of 
C or G until C is selected for commitment. 

Rule of Commitment : 

• When some clause C called by a goal G succeeds in 
solving (see below) its guard, the clause C tries to be se- 
lected for subsequent execution (i.e., proof) of G. To be 
selected, C must first confirm that no other clauses in the 
program have been selected for G. If confirmed, C is se- 
lected indivisibly, and the execution of G is said to be 
committed to the clause C. 

An important consequence is that any unification intended 
to export bindings to the calling goal must be specified in the 
clause body and use the predefined predicate 

The operational semantics of GHC is a sound - albeit not 
complete - proof procedure for Horn clause programs: if 
<- B succeeds with answer substitution 0, then V(B0) is a 
logical consequence of the program. 

Subsequently, we may find it convenient to denote a goal 
g by the pair <G;0>, i.e. g = G0. A single derivation step 
reducing the i-th atom of G using clause C and applying mgu 
0' is denoted by <G;0> -> i ; c <G';00’>. Subscripts may 
be omitted. 

3. Directed Programs 

An annotation dp for an n-ary predicate symbol p is a func- 
tion from {l,...,n} to {+,-} where V stands for input and 
for output. We will write p(+,+,-) in order to state that 
the first two arguments of p are input and the last is output 
A goal atom A generates (consumes) a variable v if v oc- 
curs at an output (input) position of A. A is generator for B, 
if some variable v occurs at an output position of A and at an 
input position of B; in this case, B is consumer of A. 

Let f denote a tuple of terms. A derivation <p(f);e> ->* 
<G;0> respects the input annotation of p if v0 = v for every 
variable v occurring at an input position of p(£). 

A goal is directed if there is a linear ordering among its 
atoms such that if Aj is generator for Aj then Aj precedes Aj 
in that ordering. A program is directed , if all its derivations 
respect directedness, i.e., all goals derived from a directed 
goal are directed. Note that directedness of a goal is a static 
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property which can be checked syntactically. Directedness of 
a program, however, is a dynamic property. 

Theorem 3.1: It is undecidable, whether a program is di- 
rected. 

Proof: Let t M (X) be a directed GHC simulation of a Truing 
machine M for a language L which binds X to halt if and 
only if M applied to the empty tape halts. Such a simulation 
is for instance described in [PLU90b]. Next consider the 
following procedures p M and q: 

p M (X,Y) <- t M (A), q(A,X,Y). 
q(halt,X,X). 

and the (directed) goal 

<- r(X,Y), s(Y,Z), p M (X,Z). 

The following annotations are given: 

t M (*)- r ( + ’-)- s(+,->. 

If M halts on the empty tape, t M (A) will bind A to ’halt’, 
p M (X,Y) will identify X and Y and thus the given goal can 
be reduced to the undirected goal <— r(X,Y), s(Y,X). 
Decidability of program directedness would thus imply solv- 
ability of the halting problem: contradictioa ■ 

Next we introduce the notion of well-formedness of a 
program w.r.t. a given annotation and show that this prop- 
erty is sufficient for directedness. 

A goal is well-formed if it is directed, generators precede 
consumers in its textual ordering, and its output is unre- 
stricted. Output of a goal is unrestricted if all its output ar- 
guments are distinct variables which do not occur (i) at an 
output position of another goal atom and (ii) at an input po- 
sition of the same atom. 

A program P is well-formed if the following conditions 
are satisfied by each clause H <— Gi,...,G m 1 Bi,...,B n in P: 

• <— Bi,...,B n is well-formed 

• the input variables of H do not occur at output positions 
of body atoms. 

The predicate '=’ has the annotation '- = -'. It is conve- 
nient to have two related primitives: '==’ (test) and '<=' 
(matching) which have the same declarative reading as '=' 
but different annotations, namely ’+ == +’ and '- $= +'. 

Note that the goal <— r(X,Y),s(Y,Z), p M (X,Z) is not 
well-formed because its output is restricted: Z has two output 
occurrences. 

The next example is taken from [UED 86 ]: 

Example 1: Generating primes 
primes(Max,Ps). «- true I 

gen(2, Max, Ns),sift(Ns,Ps). 
gen(N,Afajc,Ns) *- N < Max I N1 <= N + 1, 

gen(N7,Afax,Nsl), Ns ^[N/Nsl]. 
gen(N,Max,Ns) «- N > Max I Ns <= []. 



sift(/P/Xs;,Zs) 

sift(//,Zs) 
filter(P,/X/X^,Ys) 
fi lter(P, / X/Xs 7 , Ys) 

filter(/y;,Ys) 



filter(P,X.y,Ys),sift(ys,Zsl), 

Zs «= [P/Zsl]. 

Zs <= []. 

X mod P == 0\ filter(P,Xs,Ys). 
XmodP *0 I filter(P^,Ysl), 
Ys<= [X/Ysl]. 

Ys «= []. 



primes(+,-). gen(+,+,-). sift(+,-). filter(+,+,-). 



The call primes(Max,Ps) returns through Ps a stream of 
primes up to Max. The stream of primes is generated from a 
stream of integers by filtering out the multiples of primes. 
For each prime P, a filter goal filter(P,Xs,Ys) is generated 
which filters out the multiples of P from the stream Xs, 
yielding Ys. 

In this example all input terms are italic and all output 
terms are bold. It can easily be seen that this program is 
well-formed. 



Another example for a well-formed program is quicksort. 
The call qsort([HIL],S) returns through S an ordered version 
of the list [HIL]. To sort [HIL] L is split into two lists Lj and 
L 2 which are itself sorted by recursive calls to qsorL 



Example 2: Quicksort 



qi: qsort(/7,L) 


<r-L<=[], 


qz’ qsort(////L7,S) 


<- split(L,//,A,B), 




qsort(A,Aj), qsort(B,Bi), 




append(A 1 ,[ H/B f,S). 


sf split (/ 7 .X,Lj , L^) 


<r- Lj <= [], L 2 <= []. 


s 2 : split([X/Xs],Y,L 1 \L 2 ) 


<- X <Y 1 




split(Xj,T,L 1 ,L 2 ), 




[XfLj]. 


S 3 : split(/ XjXs 7 , y.Lj , 1 ^’ ) 


<r- X > Y 1 split(X 5 ,y,Lj,L 2 ), 




L 2 ’ <= [X/L 2 7 . 


ai: append (7/ ,Ly,L 2 ) 


<— L 2 <= Ly. 


a 2 : append(/ H/Lf JL 2 ,\j 3 ) 


<- append(Li,L2,L3'), 




L 3 <= [H/L/]. 


split(+, +,-,-). qsort(+,-). 


append(+ ,+,-). 



Theorem 3.2: Let P be a well-formed program, g a well- 
formed goal and g -»* g' a GHC-derivation. Then g' is 
well-formed. 

Proof: See [PLU92]. 

Well-formed programs respect input annotations: 

Theorem 3.3: Let <p(f),e> ->* <G’;0> be a derivation and v 
an input variable of p(£). Then v0 = v. 

Proof: Goal variables can only be bound by transitions ap- 
plying '=' or since in the other cases matching substi- 
tutions are applied. Since both arguments of ’=' are output, 
and '<=' also binds only output variables, input variables 
cannot be bound. ■ 
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4. Oriented and Data Driven Computations 

Our next aim is to show that derivations of directed pro- 
grams can be simulated by derivations which are similar to 
LD-derivations. In this context we find it convenient to use 
the notational framework of SLD-resolution and to regard 
GHC-derivations as a special case. 

We say that an SLD-derivation is data driven, if for each 
resolution step with selected atom A, applied clause C and 
mgu 0 either C is the unit clause (X = X <- true.) or C is 
B Bi,...,B n and A = B0. Data driven derivations are the 
same as GHC derivations of programs with empty guards. 
The assumption that guards are empty is without loss of 
generality in this context. 

Next we consider oriented computation rules. Oriented 
computation rules are similar to LD-resolution in the sense 
that goal reduction strictly proceeds from left to right. They 
are more general since the selected atom is not necessarily 
the leftmost one. However, if the selected atom is not 
leftmost, its left neighbors will not be selected in any future 
derivation step. 

More formally, we define: A computation rule R is 
oriented, if every derivation <G 0 ;e>-> .. cGjjOp*— > ... via 
R satisfies the following property: If in Gj an atom A* is 
selected, and A j (j < k), is an atom on the left of A k , no 
further instantiated version of Aj will be selected in any 
future derivation step. 

Our next aim is to show that, for directed programs, any 
data driven derivation can be simulated by an equivalent data 
driven derivation which is oriented. To prove the following 
theorem, we need a slightly generalized version of the 
switching lemma given in [LL087]. Here g -»i ; c;e g' de- 
notes a single derivation step where the i-th atom of g is re- 
solved with clause C using mgu 0. 

Lemma 4.1: Let g k+2 be derived from g k via 
gk -^i;C k+ i;9 k+ i gk+l “>j;C k+2 ;0 k+ 2 gk+2 • Then there is a 
derivation g k ->j;C k+2 ';e k+ i’ gk+l' ->i ; c k+ r;0k+2* gk+2’ such 
that gk+2 is a variant of g k+2 and C k+ r, C k+2 - are variants 
of C k+ 2 and C k +i. 

Proof: [LL087] The difference between this and Lloyds 
version is that the latter refers to SLD-refutations, while ours 
refers to (possibly partial) derivations. His proof, however, 
also applies to our version. ■ 

Theorem 4.2: Let P be a directed program and <G 0 ;e> a 
directed goal. Let D = <Go',£>->. . .<G k ;0 k > be a data driven 
derivation using the clause sequence Ci,...,C k . Then there is 
another data driven derivation D': <Go;£>— »...<G k ';0 k ’> 
using a clause sequence Q 1 ',...,Ci k ’ , where <ij,...,i k > is a 
permutation of <l,...,k>, each Q’ is a variant of Q and 
G k ’0k' is a variant of Gk0 k , and D’ is oriented. 



Proof: Let gj be the first goal in D where orientation is vio- 
lated, i.e. there is the following situation: 
gi : <Bi,...,R,...,R', ;0j> 

gj : <Bi,...,R ;0j> 

R’ is selected in gj and R is selected in gj. Now we 
switch subgoal selection in gj_j and gj and get a new 
derivation D*. In D* we look again for the first goal 
violating the orientation. After a finite number of iterations, 
we arrive at a derivation D’ which is oriented. It remains to 
be shown that D* (and thus D') is still data driven. 

Note that up to gj_i both derivations are identical. Above, 
the switching lemma implies that, from gj+i on, the goals of 
D' are variants of those of D. 

Now let Q be the selected goal of Gj.i. Since orientation 
is violated for the first time in Gj, Q is to the right of R. (If 
i = j - 1 then Q = R’, and otherwise j-1 would have the first 
violation of orientation.) Since gj_i = <Gj_i;0j_i> is directed, 
Q0j_l is not a generator of R0j.j and thus R0j-i and R0j are 
variants. Let H be the head of the clause applied to resolve R 
in <Gj;0j>. Since D is data driven, R0j_j = Ha for some a, 
and so R0j = Ha' for some a'. Thus D' is data driven. ■ 

Corollary 4.3: Let P be a directed program and g a di- 
rected goal. Then g has an infinite data driven derivation if 
and only if it has an infinite data driven derivation which is 
oriented. 

According to Corollary 4.3, in our context it is sufficient 
to consider data driven derivations which are oriented. Such 
derivations are still not always LD-derivations since the se- 
lected atom is not necessarily leftmost. If it is not, however, 
its left neighbors will never be reactivated in future deriva- 
tion steps; thus w.r.t. termination they can simply be 
ignored. The same effect can be achieved by a simple 
program transformation proposed in [FAL88]: 

Pro(P) = { p(X) <— I p is an n-ary predicate appearing 

in the body or the head of some clause of P 
and X is an n-tuple of distinct variables) 

Parto(P) = P u Pr G (P) 

Simulation Lemma 4.4: Let D = G„ ->...Gh -» Gj be 
an oriented SLD-derivation of G 0 and P where 
Gj.i — <— Bi,...,Bj...,B n and 

Gi = (Bi,...,Bj 1 ,C i + ,B j+1 ,...,B n )0 i . 

Q + is the body of the Cj applied to resolve Bj. Then there is 
an LD-derivation 

D' = Go ...-»...G k -i'-»G k ' with Parte (P), where 

G k -i’ = <- Bj...,B n and 

G k ' = «- (Ci + ,B j+1 ...,B n )0j . 

Proof: Whenever an atom B is selected in D which is not 
the leftmost one, first the atoms to the left of B are resolved 
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away in D' with clauses in Prc(P), and then D' resolves B in 
the same way as D. ■ 

An immediate implication is the following: 

Theorem 4.5: If g has a non-terminating data driven ori- 
ented derivation with P, then it has a nonterminating LD- 
derivation with Parto(P). 

The converse, however, is not true. Consider, for 
instance, the quicksort example from above, extended by the 
following clauses 

q 0 : qsort(_,J. 

s 0 : split(_, 

aQ: append(_ ,_,_). 

While the LD-tree for <— qsort([2,l],X) is finite in the 
context of the standard definition of qsort, it is no longer true 
for the extended program. Consider the following infinite 

LD-derivation: 

<- qsort([2,l],X) 

byq 2 .‘ <- split([l],2,A,B), qsort(A,Aj), 

qsort(B,Bi), append(Aj ,[HIB j],S). 
by so: 4- qsort(A,A|), 

qsort(B ,B i), append(Aj ,[HIB j],S). 
byq 2 : <- split(_, 

by so: <- qsort(_,_),... 

This derivation, however, is not data driven: resolving 
qsort(A,Aj) in the third goal with q 2 yields an mgu which is 
not a matching substitution. 

For data driven LD-derivations we get a stronger result: 

Theorem 4.6: There is a nonterminating data driven ori- 
ented derivation for g with P if and only if there is a non- 
terminating data driven LD-derivation for g with PartG(P). 

Proof: The only-if part is implied by the simulation lemma. 
For the if-part, consider a nonterminating, data driven LD- 
derivation D. By removing all applications of clauses in 
Pro(P), one gets another derivation D’. D' is a nonterminat- 
ing data driven oriented derivation. ■ 

Restriction to LD-derivations which are data-driven 
enlarges the class of goal/program pairs which do not loop 
forever. In the general case, termination of quicksort 
requires that the first argument is a list. Termination of 
append requires that the first or the third argument is a list. 
Restriction to data-driven LD-derivation implies that no 
queries of quicksort or append (and many other procedures 
which have finite LD-derivations only for certain modes) 
loop forever. However, goals like «- append (X,Y^) or <- 
quicksort(A,B) deadlock immediately. 

5. Termination Proofs 

In this section, we will give a sufficient condition for termi- 
nating data driven LD-derivations. We will concentrate on 
programs without mutual recursion. In [PLU90b] we have 



demonstrated how mutual recursion can be transformed into 
direct recursion. We need some further notions. 

For a set T of terms, a norm is a mapping I...I: T -> N. 
The mapping II. ..II: A — » N is an input norm on (annotated) 
atoms, if for all B = p(ti,...,tn), II B II = ^T ieI I ti I, where I 
is a subset of the input arguments of B. 

Let P be a well-formed program without mutual recur- 
sion. P is safe, if there is an input norm on atoms such that 
for all clauses c = Bo <— Bi,...,Bj,...,B n the following 
holds: If Bi is a recursive literal (Bo and Bj have the same 
predicate symbol), a a substitution the domain of which is a 
subset of the input variables of B 0 and 0 is a computed 
answer for 4 - (Bi,...,Bj_i)a, then IIBooBII > IIBjaGII. 

We can now state the following theorem: 

Theorem 5.1: If P is a safe program and G = 4- A is well- 
formed, then all data driven LD-derivations for G are finite. 

PROOF: By contradiction. Assume that there is an infinite 
data driven LD-derivation D. Then there is an infinite subse- 
quence D' of D containing all elements of D starting with the 
same predicate symbol p. Let di and di+i be two consecutive 
elements of D' and 

di = <-p(ti,...,t r ), ... 

di+l = <- p(t’i,...,t’ r ), ... 

and cj = 

p(si,...,s r ) «- Bj B k ,p(s’i s’ r ),... 

be the clause applied to resolve the first literal of di, 0; the 
corresponding mgu. Then there is a computed answer 
substitution 0' for 4 - (Bi,...,B k )0i such that p(t'i,...,t’ r ) = 
p(s'i,...,s' r )0i0'. 

Since D is data driven, 0j is a matching substitution, i.e. 
p(ti,...,t r ) = p(ti,...,t f )0j. Since P is well-formed, Theorem 
3.3 further implies p(ti,...,tr) = p(ti,...,t r )0i0'. We also 
have p(ti,...,t r )0i0' = p(si,...,s r )0i0'. 

Since P is a safe program 
llp(si,...,Sr)0i0'll > llp(s'i,...,s’ r )0j0'll and thus 
Ilp(ti,...,t r )0j0'll > Hp(t'i,...,t' r )0i0'H. Since the range of 
II... II is a well-founded set, D' cannot be infinite. 
Contradiction. ■ 

The next question is how termination proofs for data 
driven LD-derivations can be automated. In [PLU90b] and 
[PLU91], a technique for automatic termination proofs for 
Prolog programs is described. It uses an approximation of 
the program's semantics to reason about its operational 
behavior. The key concept are predicate inequalities which 
relate the argument sizes of the atoms in the minimal 
Herbrand model of the program. Now in any program 
Part<j(P) for every predicate symbol p occurring in P there is 
a unit clause p(X). Thus the minimal Herbrand model Mp of 
P equals the Herbrand base B p of P, a semantics which is 
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not helpful. To overcome this difficulty, we will consider S- 
models which have been proposed in [FLP89] in order to 
model the operational behaviour of logic programs more 
closely. The S-model of a logic program P can be character- 
ized as the least fixpoint of an operator T s which is defined 
as follows: 

T S (I) = {B I 3 B 0 <- Bi,...,B k in P,3 B 1 \...,B k ' e I, 

3 5=mgu((B 1 ,...,B k ),(B 1 ’ B k ' )), 

and B = Bob } . 

We need some notions defined in [BCF90] and [PLU91]. 
Let A be a mapping from a set of function symbols F to N 
which is not zero everywhere. A norm I ... I for T is said to 
be semi-linear if it can be defined by the following scheme: 

I t I = 0 if t is a variable 

III = A(f) +XieI ll il if t= f(tl, ... tn), 

where I c { 1 , . . . ,n } and I depends on f. 

A subterm t* is called selected if i e I. 

A term t is rigid w.r.t. a norm I ... I if 1 1 1 = I t0 1 for all 
substitutions 0. Let t[v^<-s] denote the term derived from t 
by replacing the i-th occurrence of v by s. An occurrence v^ 
of a variable v in a term t is relevant w.r.t. I ... I if 
I t[v (i) <-s] I * 1 1 1 for some s. Variable occurrences which 
are not relevant are called irrelevant A variable is relevant if 
it has a relevant occurrence. Rvars(t) denotes the multiset of 
relevant variable occurrences in the term t. 

Proposition 5.2: Let t be a term, t0 be a rigid term and V 
be the multiset of relevant variable occurrences in t. Then for 
a semi-linear norm I...I we have It0l = Itl + Xve V 
Corollary 5.3: It0 I > Itl. 

Proof: [PLU91] 

For an n-ary predicate p in a program P, a linear predicate 
inequality LI p has the form i Pi + c > Z je J Pj> where I 
and J are disjoint sets of arguments of p, and c, the offset of 
Lip, is either a natural number or oo or a special symbol like 
y. I and J are called input resp. output positions of p (w.r.t. 
Lip). 

Let Ms be the S-model of P. LI p is called valid (for a 
linear norm I...I) if p(ti,...,tn) e Ms implies Zi e i 1^1+ c > 
£je J ty- 

Let A = p(ti,...,tn). With the notations from above we 
further define: 



F(AJLI p ) = 
V in (AJLI p ) = 

V ou t(AJLI p ) = 
F in (A,LI p ) = 

Fi out(A>LIp) = 



Siel'hl- Ijejltjl+c. 
u rvars(ti) 
u rvars(tj) 

Xie I toil 
XjeJ 



F(A,LI p ) is called the offset of A w.r.t. LI p . 



Theorem 5.4: Let Zi 6 j Pi + c > Zj e j Pj be a valid linear 
predicate inequality, G = <— p(ti,...,tn)a a well-formed goal, 
V and W the multisets of relevant input resp. output variable 
occurrences of p(ti,...,tn) and 0 a computed answer for G. 
Then the following holds: 

i) ^ iel \tiGd\ + c > j , tjCT0l. 

ii) SveV 1 va0 1 +F(p(ti,...,t n ),LI p > 

XweW lw<J0 I . 

Proof: According to [FLP89], p(ti,...,tn)cr0 is an instance 
of an atom p(si,...,Sn) in the S-model Ms of P. Since the 
output of G is unrestricted, tjO0 = Sj for all je J. Proposition 
5.2 implies lti<J0l > Iql for all ie I. Thus 

S ie 1 ^ Z i6 1 1 s i 1 ^ X je j ^ = S j 6 j 1 Sj I 

which proves the first part of the theorem. The second part is 
implied by Prop. 5.2. ■ 

Theorem 5.4 gives a valid inequality relating variables oc- 
curring in a single literal goal. Next we give an algorithm for 
the derivation of a valid inequality relating variables in a 
compound goal. 

Algorithm 5.5 goal_inequality(G f LI,U,W,A,b) 

Input: A well-formed goal G = 4 - Bi,...,B n , a set LI 

with one inequality for each predicate in G, and 
two multisets U and W of variable occurrences. 
Output: A boolean variable b which will be true if a valid 
inequality relating U and W could be derived, and 
an integer A which is the offset of that inequality. 

begin 

M := W; A := 0; V := U; 

For i := n to 1 do: 

If M n V 0 ut(Bi JLIp) * 0 then 
M := (M\ V 0 ut(Bi^Ip)) u (V in (Bi,LI P )\ V); 

V := V\ V in (BiJLIp); 

A := A + F(Bi,LI P ). fi 
If M = 0 then b:= true else b:= false fi 
end. 

Next we show that the algorithm is correct: 

Theorem 5.6: Assume that the inequalities in LI are valid 
and b is true, a is an arbitrary substitution such that Go is 
well-formed and 0 is a computed answer substitution for 
G0. Then £ vg v Iva0l + A > £ wg W lwa0l holds. 

Proof: See [PLU92]. 

Algorithm 5.5 takes time 0(m) where m is the length of G. 

[PLU90b] gives an algorithm for the automatic derivation 
of inequalities for compound goals based on and/or-datafiow 
graphs which has exponential runtime in the worst case. 
Algorithm 5.5 makes substantial use of the fact that G is 
well-formed: each variable has at most one generator; which 
makes the derivation of inequalities deterministic. 
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6. Derivation of inequalities for S-models 

In Aection 5 it has been assumed that linear inequalities are 
given for the predicates of a program P. We now show how 
these inequalities can be derived automatically. We assume 
that P is well-formed and free of mutual recursion. Let pc* q 
if p * q and p occurs in one of the clauses defining q. 
Absence of mutual recursion in P implies that < n defines a 
partial order which can be embedded into a linear order. 
Thus there is an enumeration {pi,...,pn} of the predicates of 
p such that pi < n pj implies i < j. We will process the predi- 
cates of P in that order, thus in analyzing p we can assume 
that for all predicates on which the definition of p depends 
valid inequalities have already been derived. Note that a 
trivial inequality with offset °° always holds. 

Let in(A) and out(A) denote the sets of input resp. output 
variables of an atom or a set of atoms according to the anno- 
tation of the given programs. 

Algorithm 6.1: predicate_inequalities(P,LI): 

Input: A well-formed program P defining Pi,...,Pn. 

Output: A set LI of valid inequalities for the predicates of P. 

begin 
LI := 0 
For i:= 1 to n do: 
begin 

Let Ci,. . .,c m be the clauses defining pi. 

Let M, N be the input resp. output arguments of Pi. 

U: = + ^ Xvsn'Pv 1 - 

bi := true. 

For j:= 1 tom do: 
begin 

Let c^be Bo <- Bi,...,Bk. 
goal_inequality((<- B i , . . . ,B 0 , 
LMli},V in (Bo),Vout(B 0 ), Ai,bi) 
c:= Aj + F ou t(Bo,li) - Fj n (Bo,li). 

d>j := bi 

If c contains then d>i := d>i a false 
(*) elseif c is an integer then <&i := d>i a (y > c) 

(**) elseif c = y + d a d < 0 then Oj := Oj a true 

elseif c = Y+ dAd>0 then d>i := <&i a false 
(***) elseif c = k*Y + HA k>l, 
then d>i := <J>i a (y < n/(l-k). 
end 

If d>i is satisfiable then let 5i be the smallest value for 
Y which satisfies Oj 
else let 8i be ©o'. 

Replace Y in li by Si. 

LI := LI u {U} 
end 
end 

Theorem 6.2: The inequalities derived by the algorithm 
are valid. 

Proof: By induction on the number of predicates n in P. 

The case n = 0 is immediate. For the inductive case, assume 
that the derived inequalities for the predicates pi,...,Pn-i arc 



valid. Let Io be the minimal S-model of P restricted to the 
predicates pi,...,Pn-i. In the context of the program which 
consists of the definition of Pn only, let if = Io and = 

T.ar 1 ). Its limes equals the minimal S-model of P 
restricted to the predicates pi,...,pn- Now we have to show 
that the inequality li derived for p n is valid w.r.L T™ . The 
proof is now by induction on m. The case m = 0 is implied 
by the induction assumption on n. Assume that the theorem 
holds for n - 1. We have to show that the inequality for p n 
holds for the elements of T^\ Now lett Be Tj 1 and 
Bo Bi,...3k be the clause applied to derive B. We have 
B = B o 0, where 0 is a computed answer substitution for 

<- Bi B|c, which is a well-formed goal. Let V = in(B 0 ) 

and W = out(Bo). Let LI be the set of inequalities derived by 
Algorithm 6.1 , and A be the result of calling 
goal_inequality((<- Bi,...,B k ),LI,V,W, A, bi). Theorem 5.6 
and the induction assumption imply 

<*> I V6 V lv0l + A ^Iw e W lw01 
Since B = B o 0, we have F in (B,li) = F in (B 0 ,li) + X veV lv01 
and F out (B,li) = F out (B 0 ,li) + XweW lw01 - 1x1 a be 
offset of li. We have to show 

(«) F^BJi) + a > F out (B,li). 

If bj is false or A is °°, we are done since in that case a is 
Three more cases remain. (*) and (**) immediately imply 

CTO) a>A + F out (B 0 4i)-F in (B 0 ,li). 

(**♦) implies a < n/(l-k) and thus a > n + k*a for some n 
such that n + k*a = A + F out (B 0 ,li) - Fj n (B 0 ,li). Again 
($$$) follows, (t) and together now imply ($$). ■ 

Note that Algorithm 6.1 again has run-time complexity 
0{n), where n is the length of the given program P. 

Algorithm 6.1 is not yet able to derive pi > p 2 for a unit 
clause like p(X,Y) with mode(p(+,-». This inequality, how- 
ever, holds since in a well-formed goal the output argument 
of p will always be unbound. To overcome this difficulty, 
we assume that before calling predicate_inequalities(P,LI), P 
will be transformed to P' in the following way: 

Define freevars(Bo «- Bi B„) = 

(out(Bo) \ out(B i,. . . ,B ji)) u in(B i,. . . ,Bn) \ in(Bo)). 

Now for the clause c = Bo <- Bi,...,B n in P let freevars(c) 
= {Yi,...,Y m }. Replace c by B 0 <- q(Yi,...,Y m ),Bt,...,B n 
where a new predicate q is defined by the unit clause 
q(Xi,...^C m ) with mode(q(+,...,+)). Note that, after that 
transformation, F is well-formed if P is well-formed, and if 
an inequality is valid for F it is valid for P as well. In the 
example mentioned above, input for Algorithm 6.1 will be 
the program P = {q(X). , p(X,Y) <— q(Y)} and the output 
will be {0 £ qi, pi > P 2 ). 

Another improvement can be made by considering subsets of 
the input arguments in order to achieve stronger inequalities. 
This, however, makes the algorithm less efficient. 
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7. Example 

We finally discuss how, with the techniques given so far, it 
can be shown that the GHC program for quicksort specified 
in Section 3 terminates for arbitrary goals. 

Corollary 4.3 and Theorem 4.5 imply that is suffices to 
consider data-driven LD-derivations of the extended program 
for qsort including the clauses so, ao and qo. According to 
Theorem 5.1 we only have to show that the three predicates 
of the program are safe. This is easy to show for split and 
append. In fact these procedures are structural recursive. It 
is more difficult to prove of qsort because in q2 both 
recursive calls contain the local variables A and B. For this 
reason we need a linear predicate inequality for split which 
has the form split] + y > split3 + splits After the 
transforamtion mentioned at the end of the last paragraph so 
will have the following form: 
s 0 : split(L ] ,L 2 ,L 3 ,L 4 ) <- q(L 3 ,L 4 ) 

Now so and si give y > 0 (case * in Algorithm 6.1), while S2 
and S3 give 'true' (case **). Thus we get split] + 0 > split3 + 
split4. In order to prove safety of qsort, we only have to 
consider q2. Using this inequality Algorithm 5.5 
immediately shows llqsort([HIL],S)0ll > llqsort(A,Ai)6ll and 
llqsort([HIL],S)0l > llqsort(B,B])0ll for all answer 
substitutions 0 for split(H,L,A,B). Thus qsort is safe. 
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Abstract 

Approaches to learning by examples have focused on gener- 
ating general knowledge from a lot of examples. In this paper 
we describe a new learning method, called analogical gener- 
alization, which is capable of generating a new rule which 
specifies a given target concept from a single example and 
existing rules. Firstly we formulate analogical generalization 
based on the similarity between a given example and existing 
rules from the logical viewpoint. Secondly, we give a new pro- 
cedure of inductive learning with analogical generalization, 
called ANGEL. The procedure consists of the following five 
steps: (1) extending a given example, (2) extracting atoms 
from the example and selecting a base rule out of the set of 
existing rules, (3) generalizing the extracted atoms by means 
of the selected rule as a guide. (4) replacing predicates, and 
(5) generating a rule. Through the experiment for the system 
for parsing English sentences, we have clarified that ANGEL 
is useful for acquiring rules on knowledge based systems. 

1 Introduction 

Machine learning has a great contribution to improving per- 
formance through automated knowledge acquisition and re- 
finement, and so far, various types of machine learning 
paradigms have been considered. In particular, learning from 
examples, which can form general knowledge from specific 
cases given as input examples, has been well studied and a 
lot of concerned methods have been proposed[Mitchell 1977, 
Dietterich and Michalski 1983, Ohkawa et al. 1991]. 

Generally, in learning from examples, we have to give a 
lot of examples to the learner. Why are so many examples 
required? We think the reason for this is that the bias for 
restricting the generalization is relatively weak, because it is 
independent of the domain. However, when a human being 
acquires new knowledge, he would not always require a lot of 
examples. As the case may be, he can learn from one exam- 
ple. We think this is because he decides a strong bias for the 
generalization according to the domain, and generalizes the 
examples based on the bias. That is, in order to generalize a 
few examples appropriately, a strong bias which depends on 
the domain is indispensable. 

It is necessary to consider how the strong bias should 
be provided. Let us recall the behavior of a human being 
again. When acquiring new knowledge, he often utilizes sim- 
ilar knowledge which is already known. In other words, the 



existence of similar knowledge may help for him to associate 
new' knowledge. This process is called analogy. Analogy is 
considered promising to realize learning from a few examples. 
Since analogy will be regarded as one of the most effective 
way for restriction on generalization, modeling its process 
will make it possible to provide a domain dependent bias. 

In this paper, we propose a new learning method, called 
ANGEL (ANalogical GEneraLization), which is capable of 
generating a new rule from a single example. In ANGEL, 
both the rules and the examples are represented as logical for- 
mulas. We introduce the notion of analogy [Winston 1980], 
namely, the similarity between the example and the exist- 
ing rules as the bias for the generalization[Mori et al. 1991]. 
The similarity is determined by comparing the atoms of both 
the example and the existing rules. Based on the similarity, 
firstly, ANGEL extracts atoms from the example and selects 
a rule out of the existing rules; next, it generates a new rule 
by generalizing the extracted atoms by means of the selected 
rule as a guide. 

The next section describes the definition of analogical gen- 
eralization. In this section we consider analogical generaliza- 
tion from the logical viewpoint. Section 3 gives the procedure 
of ANGEL which is a method for learning based on analogi- 
cal generalization. In this section, we also give consideration 
to the experimental result of learning by ANGEL. Finally in 
section 4, we clarify the originality of ANGEL through its 
comparison to other related works. 

2 Analogical generalization 

To represent knowledge, we use the form which conforms 
to first order predicate logic. Two kinds of forms, called a 
fact and a rule, are provided. A fact is represented as an 
atom, while a rule is represented as a Horn clause, which is 
expressed in the form of 

a <- pi,...,/3 n , 

where a,(3\, . . . ,(3 n are atoms. Letting r be a rule a <— 
j3\, . . . ,/3 n , we denote the consequence of rule r, namely a, by 
cons(r), and denote the premise of rule r, namely (5\, . . . ,/3 n , 
by prem(r). 

The underlying notion of analogical generalization is that 
a new rule is generated by generalizing an input example, 
which consists of facts, based on the similarity between the 
example and the existing rules. Before formulating analogical 
generalization, we define the similarity between two atoms, 
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and next formalize the similarity between two finite sets of 
atoms. 

2.1 Similarity between two atoms 

First, we define some basic notations. A substitution is a 
finite set of the pair v/t, where v is a variable, t is a term, 
and the variables are distinct. Let 6 = {vi/ti, . . . ,v n /t n } 
be a substitution and e be an expression, which is either a 
literal or a conjunction or disjunction of literals. Then e0 is 
the expression obtained from e by replacing each occurrence 
of the variable V{ in e by the term tj. If S' is a finite set 
of expressions and 0 is a substitution, SO denotes the set 
{ed | e € S}. 

Let 0 be a substitution and S be a finite set of atoms. If SO 
is a singleton, S is unifiable by 0 and we write unifiable(S) . 

Now, we give the following two functions, and define the 
similarity between atoms by means of these functions. Let 
R be a set of existing rules, and a and of be atoms. 

Definition 1 ( 12-deducible set ) 

$(12, o) = f {/? | 12 U {a} I- (3,(3 is an atom}. 

Definition 2 (12-similar set ) 

*(12, a, a') {P\(3e *(12, a), 3 /3' 6 $(12, a'), 

unifiable{{fi,f3'})}. 

12-deducible set means all of newly obtained information 
when a certain fact has been known. Thus the intuitive 
meaning of 12-similar set is newly obtained information in 
common when each of two distinct facts has been known. 
Therefore we can say that 12-similar set represents the rele- 
vance between two facts under the background knowledge. 

Definition 3 (Similarity between atoms) Let a, cvi 

and c *2 be atoms. If the following relation holds, a is more 
similar to <22 than ai with respect to 12. 

*(12, a, cti) C *(12, a, 02 ) 

And if the following holds, the similarity between a and 
ail is equal to the similarity between a and <22 with 
respect to R. 

*(12, a, ai) = *(12, < 2 , 0 : 2 ) 

Since R-similar set reflects the relevance between two given 
facts, the similarity between a certain fact and two distinct 
facts can be evaluated in terms of the subsumption relation 
between R-similar sets reasonably . 

For example, let Ri be a set of rules shown as follows. 

R\ — {parent (a:, y) <— father(x,y), 
parent(a:,y) <— mother(r, y), 
f amily(a:, y) *— parent(r, y ), 
family(x,y) <— brother(x, y), 
hates(x, y) <— kills(x,y), 
hates(cc, y) <— hurts(x, y), 
hates(x, y) +— strikes(a;, y)} 



Let us consider the similarity of father(x,y) to 
mother( Jim, Betty) and brother(Tom, Joe). For each atom, 
the following 12-deducible sets are derived as 

*(12i, father (a:, y)) = {father(a;,y),parent(x, y), family ( 2 , y)} 
*(12i,mother(Jim, Betty)) 

= {mother( Jim, Betty), parent(Jim, Betty), 
f amily( Jim, Betty)} 

$(12i,brother(Tom, Joe)) 

= {brother(Tom, Joe),f amily(Tom, Joe)}. 

12-similar sets of father(*,y) for mother (Jim, Betty) and 
brother(Tom, Joe) are as follows. 

*(12!, father(r,y),mother(Jim, Betty)) 

= {parent(r,y),family(a:,y)} 
*(12i,father(r,y),brother(Tom, Joe)) = {family(a;, y)} 

Accordingly f ather(x, y) is more similar to 

mother(Jim, Betty) than brother(Tom, Joe) with respect to 
Ri. This result matches our intuition very well. 

2.2 Similarity between two finite sets of atoms 

The similarity between two finite sets of atoms is determined 
by the similarity between elements of each set. In this case, 
we also have to consider the matching between atoms in each 
set. We begin with the definition of correspondence between 
two sets of atoms. 

Definition 4 (Correspondence ) Let A and B be finite 
sets of atoms. Correspondence ip of A to B is defined as 
follows , 

1. <p is a relation on A and B. 

2. There is a substitution 0 and for all (< 2 ,/?) 6 pO , 

arity(a) — arity(fi), 

arg(a,n) = arg(/3,n) (n = 1,2,...), 

where arity(a) indicates the number of arguments of a, 
and arg(a,n ) indicates the value ofn-th argument of a. 

3. For all a E A, there is an atom [3 such that (cx, (3) G <p. 
And for all ft G B, there is an atom a such that (a, j3) G 

p. 

For example, let A\ and B\ be sets of atoms shown as 
follows. 

A 1 = {father(x,y),kills(y,z)} 

Bi = {mother(Jim, Betty), hurts(Betty, Jim)} 

In this case, two correspondences p\ , p >2 of A\ to B\ are 
obtained. 

Pi = {(father(x,y),mother(jim, Betty)), 

(kills(y, z), hurts (Betty, Jim))} 

P 2 = {(father(x,y),hurts(Betty, Jim)), 

(kills(y, z), mother (Jim, Betty))} 

Definition 5 (Precedence of correspondence) 

Let A and B be sets of atoms, <p\ and P 2 be two distinct 
correspondences of A to B . Then 
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• For all a in A, a is similar to (3\ such that € p\ 

than (32 such that (a, ^ 2 ) £ P 2 , or the similarity between 
a and (3\ is equal to the similarity between a and @2 with 
respect to R, and 

• There exists a in A, which is similar to (3\ such that 
( a, [3i ) G (pi than (32 such that (a, fc) G <p 2 , with respect 

to R, 

if and only if we say that correspondence (pi precedes 
< p 2 with respect to R. For a correspondence (p of A to 
B, if there is no correspondence that precedes <p, we call p a 

maximally preceding correspondence of A to B with 
respect to R. 

Maximally preceding correspondence represents the 
matching between the most similar atoms in two sets of 
atoms with binding variables consistently. 

In the above example, (p\ precedes another corre- 
spondence, namely, P 2 , with respect to R\, because 
father(x,y) is more similar to mother( Jim, Betty) than 
hurts(Betty, Jim) and likewise kills(y, 2 ) is more similar 
to hurts(Betty, Jim) than mother(Jim, Betty). Therefore 
pi is a maximally preceding correspondence of A\ to B\ with 
respect to Ri. 

Definition 6 (Similarity between sets of atoms) 

Let A, A' , B and C be sets of atoms, p \ 3 be a maximally 
preceding correspondence of A to B with respect to R and 
pc be a maximally preceding correspondence of A' to C with 
respect to R. Then 

• For all a in ADA', a is similar to (3b such that (a ,(3b) £ 
PB than (3c such that (a, (3c) £ Pc> or the similarity 
between a and (3 b is equal to the similarity between a 
and (3c with respect to R, and 

• There exists a in A fl A' , which is similar to (3 b such 
that (a, (3b) £ PB than (3c such that (a, (3c) £ pc> with 
respect to R, 

if and only if we say that the similarity between A and B 
is stronger than the similarity between A' and C with 
respect to R, denoted by 

[A: B]y [A 1 : £7]. 

Now, we assume C\ is the following set of atoms. 

Ci = (brother (Tom, Joe), strikes(Joe .Mark)} 

A maximally preceding correspondence of Ai to C\ with 
respect to i?i is shown as 

{(f ather(x, y ), brother(Tom, Joe)), 

(kills(y, 2), strikes(Joe, Mark))}, 

and therefore, 

[Ai : Bi] >- [Ax : Ci]. 



2.3 Formulation of analogical generalization 

In this section, we proceed to formulate analogical general- 
ization. First we give a logical consideration on analogical 
generalization under five conditions to generate a rule, dis- 
cussing these conditions briefly. 

Let r be a non-ground atom which represents a target 
concept, and E be an example, that is, a set of ground atoms 
which is relevant to the target concept. In this case a non- 
ground atom is an atom containing variables and a ground 
atom is an atom containing no variable. We assume that 
E contains r', called target instance, such that unifiable({T , 
t'}). Let E 1 be a set given by removing target instance t' 
from E , and E" be a set of ground atoms deduced by RU E. 
Analogical generalization is formulated as follows. 

Definition 7 (Analogical generalization) Given 
R, E, t , and if 

RU E 1 'tf r 1 , (1) 

then generating a rule r such that 

RUE'C{r}hT', (2) 

RU E' U {r} is consistent, and (3) 

r satisfies the following five conditions, (4) 

is called analogical generalization. 

• Selection condition 

There is a substitution 8 such that 

n {r)8 C E", 
cons(r)9 = r ' , 

where Il(r) denotes a set of all atoms that constitute r. 

• Similarity condition 

There is a rule r'(£ R), provided that 

1. There is a correspondence o/n(7 ,/ ) to II (r) (9, which 
contains ( cons(r'),T ') 1 . 

2. For an arbitrary set of atoms A(C E"), the follow- 
ing relation does not hold. 

[Il(r') : A] £ [n(r') : II(r)0]. 

3. For an arbitrary rule r"(£ R) and an arbitrary set 
of atoms A(C E"), the following relation does not 
hold. 

[A : n(r")] £ [U{r)e : II(r')]. 

• Significance condition 

For a rule r' which satisfies similarity condition 2 , letting 

p be a correspondence ofB(r') to II(r)0, 

U *(i?,a,/?)^0. 

(a,P)£(p 

1 6 indicates the same substitution in selection condition. 

2 We call r a base rule. 
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• Generality condition 

For a base rule r 1 , letting p be a correspondence ofH(r') 
to II(r), 

v (a,/3) G p, arg (a,n) = arg(/3,n) (n = 1,2,. . .). 

• Applicability condition 

For a base rule r' , let (pi be a correspondence of n(/) 
to II(r)#. Let <f 2 be a correspondence of II (r') to 
A(C E") which contains t' , provided that (p2 contains 
( cons(r'),T '). For all a G n(r'), if R U {a} I / fa or 
{a} h such that (a, fa) € pi, R U {/?i} V /?2 or 
f3i = /?2 such that ( a,/3\ ) G p\ has to holds. 

Since there are, in general, many rules satisfying the equa- 
tion (2) and (3), we have introduced the five conditions as 
constraints for the rule r. 

Selection condition means that the rule r is generated mak- 
ing use of predicates which are used for representing given 
examples and existing rules. 

Similarity condition is a condition for the purpose of gen- 
erating a rule which is similar to an existing rule. A base 
rule, which is the most similar rule to a given example in ex- 
isting rules, is selected appropriately due to this condition. 
Moreover, it guarantees that, with respect to the similarity, 
relevant atoms are extracted from the example for the se- 
lected base rule. That is, this condition is regarded as a bias 
depending on the domain specific knowledge. 

Similarity condition is a condition for checking the valid- 
ity of a base rule based on a relative comparison of the sim- 
ilarities between a base rule and an example, while signifi- 
cance condition investigates absolutely the relevance between 
a base rule and an example by means of R- similar set. Rules 
not satisfying significance condition should be regarded as 
absurd rules. 

Generality condition removes constants which occur in an 
example from the generated rule. It aims at the versatility 
of the generated rule. 

If an atom a forms a rule r and R U {a} is able to deduce 
another atom a 1 , a rule formed by an atom a' instead of a 
also satisfies the equation (2) and (3). In this case, the latter 
rule is more applicable than the former. Applicability condi- 
tion guarantees the most applicable rule can be adopted. 

3 ANGEL 

3.1 Procedure 

This section presents ANGEL in detail. If the set of exist- 
ing rules R, an example E and target concept r are given, 
ANGEL generate a new rule by means of analogical general- 
ization. We show the overview of ANGEL in Figure 1. 

If R consists of recursive rules, R-deducible set will be infi- 
nite. Then, we assume R has no recursive rule for computing 
the similarity between atoms practically. 

The procedure of ANGEL consists of five steps: (1) ex- 
tending an example, (2) extracting atoms from the example 
and selecting a base rule out of the set of existing rules, (3) 
generalizing the extracted atoms, (4) replacing predicates, 







Figure 1: Overview of ANGEL 



and (5) generating a rule. We show briefly each step as be- 
low. 

STEP1 Extending an example 

Generate a set of ground atoms which are deduced by 
RU E and denote it by E. If an atom a(G E) can be 
deduced by R U {«'} ( a ' ^ a, a 1 G E), remove the atom 
a from E. 

STEP2 Extracting atoms and selecting a base rule 

For each rule r' G R, make correspondences of n(r') to A 
which is an arbitrary subset of E. At this time, cons(r') 
will certainly correspond to the target instance. If a set 
A'(/ A) such that, 

[n(/) : A'] y [n(r ; ) : A], 

A' CE 

does not exist, regard the correspondence of n(r / ) to 
A as a candidate of useful correspondence; otherwise 
abandon the set A. Note that once abandoned sets for 
a certain rule are never adopted for other rules. 

For all candidates of useful correspondences, evaluate 
the similarities between subsets of an example and rules. 
And if a correspondence of A' to II (r") such that, 

[A' : H(r")] * [A : II(r')], 

A' C E, 
r" G R 

does not exist, adopt the correspondence of A to II (r 1 ) 
as a useful correspondence. 

STEP3 Generalizing atoms 

Generalization is performed by turning constants to 
variables. As a result of STEP2, there is at least one 
useful correspondence (p of Il(r'), in which r' is selected 
out of R, to A, which is a subset of E. Now, turn con- 
stants in atoms in the set A to variables which occur at 
the same position of n(r') according to the correspon- 
dence tp. 
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STEP4 Replacing predicates 

For each pair of atom ( a,/3 ) in (p which is a useful cor- 
respondence of n(r') to A, if &(R,f3) contains an atom 
which consists of the same predicates as a , replace the 
predicate of j3 with the predicate of a. Otherwise, let S 
be a set of atoms in &(R, (3) provided that none of whose 
predicates occurs in $(R, a). Replace the predicate of 
j3 with the predicate of 7(6 S ) such that 

Ves, 7). 

STEP5 Generating a rule 

Finally, generate a new rule r in which cons(r) consists 
of the atom which is generalization of the target instance 
and prem(r) consists of the atoms which are generaliza- 
tions of the atoms in the set A except the target instance. 

3.2 Examples and discussions 

In this section, we present the two examples of learning by 
ANGEL. And we clarify the effectiveness of ANGEL by con- 
sidering the experimental results. 

First, we show a simple example in order to follow the 
behavior of ANGEL. A set R 2 which consists of seven existing 
rules defines relations of family. E\ is an example for the 
target concept “grandmother(s,t)”. 



R 2 = { grandf ather(x, z) <— parent(x, y), f ather(i/, z), • • • (rl) 

uncle(a:, z) <— parent(a!, 2 /), brother^, z), • • ■ (r 2) 

cousin(a:, y) 

<— parent(a;, v), parent(y, w), brother(v, w ), • • • (r3) 

parent(:r, y) <— mother(a:, y), ■ ■ ■ (r4) 

parent(a;, y) <— father(x, y), ■ ■••(r5) 

f amily(:r, y ) <— parent(z, y), • • • (r6) 

family(a;, y) <— brotber(a;, 3 /)} • • ■ (r7) 



Ei — {grandmother(Peter, Mary), 
mother(Paul, Mary), 
f ather(Peter, Paul), 
mother(Peter, Lucy), 
likes(Paul, Mary), 
engineer (Peter), 
student (Paul)} 

If E\ is given, ANGEL starts to extend the example. In 
this case, since no atom has been deduced, the extension of 
Ei is Ei itself. 

In STEP2, candidates of useful subsets of Ei are found for 
the rule rl as follows. 



{grandmother(Peter) Mary), 




f ather(Peter, Paul), 




mother (Paul, Mary)} 


■••(si) 


{grandmother(Peter, Mary), 




f ather(Peter, Paul), 




likes(Paul, Mary)} 


•••(s2) 



In these sets, since the relation 

[n(rl) : si] y [n(rl) : s 2 ] 

holds, the set s2 is abandoned. As a result, only si are 
adopted as the useful set of atoms. Likewise, si is adopted 



for the rule r2. And no set of atoms is adopted for other 
rules r3 ~ rl . 

Next, the similarity between II(rl) and II(r2) is evaluated. 
As a result, the rule rl is adopted as a useful rule, because 
the relation 

[si : n(rl)] y [ si : II(r2)] 

holds. 

In STEP3, the generalization will be accomplished. Now, 
there have been the following correspondences of II(rl) to 
si. 

{(grandfather (a;, z), grandmother(Peter, Mary)), 
(parent(as, y), f ather(Peter, Paul)), 

(f ather(y, z), mother(Paul, Mary))} 

Therefore, the set of generalized atoms are obtained as fol- 
lows. 

{grandmother(x, z),father(x,y),inother(y, z)} • • • (si') 

Next, in STEP4, predicates in si' are replaced with 
more applicable one. In this case, predicate father in 
si' is replaced with predicate parent, because predicate 
parent occurs in 4>(i?2> f ather(x, y)). While predicate 
mother in si' is not replaced, because predicate father 
never occurs in 4>(R2, mother^, z)) and atom mother(y,z) 
is the only one atom in $(R 2 ,mother(y,z)) except atoms in 
$(f? 2 ) f ather(y, z)). As a result of the replacement of pred- 
icates, a set of atoms are modified as 

{grandmother(x, z),parent(a;,y),mother(y,z)}. •• • (si") 

In STEPS, finally, according to the above set si", the fol- 
lowing new rule is generated and added to R 2 . 

grandmother(a:, z) <— parent(x, y), mother(y, z) • • • (r8) 

The rule r 8 satisfies the requirement for analogical gener- 
alization given at Definition 7, and it is just appropriate rule 
about the target concept. In this case, good learning has 
been performed, because the rule which is closely similar to 
the rule for target concept is in the existing knowledge base. 

In rule based systems, generally, the lack of rules causes ei- 
ther interruptions or mistakes on inference. ANGEL is useful 
for such a situation, because it is possible to continue infer- 
ence by generating new rules from given examples. 

Next we show an example of acquiring rules for the system 
for parsing simple English sentences. The target system is 
capable of parsing English sentences by means of syntactic 
rules shown as Figure2. In this system a sentence is treated 
as a list. For example the sentence “ The sun rises in the 
east ” is represented as the list, 

[the, sun, rises, in, the, east] 

And 

noun_phrase( [the, sun, rises, in, the .east] , 

[rises , in, the , east] ) 

indicates that [the, sun] is noun phrase. The system exam- 
ines whether or not a given sentence is grammatically valid 
by a backward chaining inference by means of the syntax 
rules. 
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sentence(s, e) <— noun_phrase(s, i>i), verb_phrase(vi, e). 
sentence(s,e) <— noun_phrase(s, t/j), verb_phrase(ri, v 2 ), 
prepositional_phrase(u 2 , e). 
sentence(s, e) <— present_progressive(s, e). 
sentence(s, e) present_passive_voice(s, e). 
sentence(s, e) <— present_perfect(s, e). 
noun_phrase(s,e) <— determiner(s, Vi),noun(in,e). 
noun_phrase(s, e) <— noun(s, e). 
prepositional_phrase(s,e) <— propositions, ri), 
noun.phr ase (r 1 , e ) . 
verb_phrase(s,e) <— verb(s, e). 

verb_phrase(s, e) <— verb(s, Vi),noun_phrase(v 1 ,e). 
present_progressive(s, e) <— noun_phrase(s, tn), 

present J3E(vi, r 2 ),present_participle(v 2 ,e) 
present_progressive(s, e) *— noun_phrase(s, ri), 

present _BE ( Vi , v 2 ) , pr e s ent _part i c ipl e (v 2 , ^ 3 ) , 
noun_phrase(v 3 , e) 
verb(s, e) *— BE(s, e). 
verb(s,e) <— main_verb(s, e). 
verb(s, e) <— present_verb(s, e). 
verb(s,e) <— past_verb(s,e). 

BE(s, e) <— present_BE(s, e). 

BE(s, e) <— past.BE(s, e). 

main_verb(s, e) <— present_main_verb(s, e). 

main_verb(s,e) <— past _main_verb(s, e). 

present_verb(s, e) *— present_BE(s, e). 

past_verb(s, e) <— past_BE(s, e). 

present_verb(s, e) <— present jnain_verb(s, e). 

past_verb(s, e) <— past_main_verb(s, e). 

auxiliary _verb(s, e) <— present.auxiliary_verb(s, e). 

auxiliary _verb(s, e) <— past_auxiliary_verb(s, e). 

participle(s, e) <— present_participle(s, e). 

participle(s, e) <— past_participle(s, e). 

determiner^, e) <— THE(s,e). 

noun(s, e) *— SUN(s, e). 

noun(s,e) <■— EAST(s, e). 

noun(s,e) <— D00R(s, e). 

noun(s,e) <— HER(s,e). 

noun(s,e) HE(s,e). 

noun(s,e) <— l(s,e). 

noun(s,e) <— H0MEW0RK(s,e). 

present_main_verb(s, e) <— HAVE(s, e). 

present_main_verb(s, e) <— RISES(s,e). 

present_auxiliary.verb(s, e) <— HAVE(s, e). 

present J3E(s,e) <— IS(s, e). 

past_participle(s, e) *- CL0SED(s,e). 

past .participle (s,e) «■- RESPECTED(s, e). 

past_participle(s, e) <— FINISHED(s, e). 

preposition(s,e) IN(s,e). 

preposition(s,e) <— BY(s,e). 

Figure 2: A part of rules in existing knowledge base 



As Figure2 indicates, initially, the rule to define syntax 
about the present passive voice is insufficient. Then we have 
tried to generate a lacking rule by ANGEL. 

For the target concept “present _passiver.voice(s, e)”, we 
have given the following example E 2 to ANGEL. 

E 2 — { present _passive_voiceC[the,door,is, closed] , []), 
THE( [the, door, is, closed] , [door, is, closed] ), 

D00R( [door, is, closed] , [is, closed] ), 

IS([is, closed] , [closed]), 

CLOSEDC [closed] , [])} 

Firstly, the given example E 2 has been extended to the 
following set E%. 

E 2 = { present_passive_voice( [the, door, is, closed] , []), 
THE( [the , door , is , closed] , [door , is , closed] ) , 

D00R( [door , is , closed] , [is , closed] ) , 

IS ( [is, closed] , [closed] ), 

CLOSEDC [closed] , []), 

noun.phraseC [the, door, is, closed] , [is, closed]), 
sentence ( [the, door, is, closed] , [closed] )} 

Then, the useful correspondence has been found as follows 
by using a rule for “present_progressive” as a base rule. 

{(present_progressive(s, e), 
present.passive.voice ( [the , door , is , closed] ,[])), 
(noun.phrase ( s, v \ ) , 

noun.phraseC [the, door, is, closed] , [is, closed] )), 
(present _BE(ri,r 2 ), ISC [is, closed] , [closed])), 
(present_participle(r 2 ,e), CLOSEDC [closed] , []))} 

As a result, we have confirmed that ANGEL generates the 
following one rule successfully. 

present_passive_voice(s, e) <— noun_phrase(s,i>i), 

present _BE(i/i, v 2 ), 
past_participle(r 2 , e) • • • (r9) 

The generated new rule r9 is added to the knowledge base. 

Again we have given an example sentence “A mouse is 
caught by a cat.” for the same target concept. 

In this case, two distinct rules rlO and rll are generated 
by using the identical base rule in the existing knowledge 
base. 

present_passive_voice(s, e) <— noun_phrase(s, vi), 

presentJJE^,^), 

past .part i c ipl e (v 2 , ^ 3 ) , 

prepositional_phrase(r 3 ,e) 

■ ■ • (rlO) 

present_passive_voice(s, e) <— sentence(s, vj), 

participle(vj, v 2 ), 
preposition(v 2 , ^ 3 ), 
noun_phrase('y 3 , e) 

•••(rll) 

Like the above, ANGEL sometimes generates several rules 
for one example. It is now important to examine whether 
each of the generated rules is appropriate. For instance, The 
rule rlO is a suitable rule, whereas the rule rll is obviously 
strange. The reason for this is none of the rules in the existing 
knowledge base are really similar to the given example. Since 
atom noun.phrase^, e) in selected base rule 

present_progressive(s, e) <— noun_phrase(s,i;i), 
present-BE^j , v 2 ), 
pre sent .par t ic iple (y 2 ,v 2 ), 
noun.phrase^, e) 
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corresponds to atom prepositional_phrase(u3, e) in the 
rule rlO and atom noun_phrase(v3, e) in the rule rll 
(namely, the given example is regarded as the sentence con- 
sisting of some phrases and noun.phrase), the similarity be- 
tween the base rule and the rule rll are stronger than the 
one between the base rule and the rule rlO in respect of these 
atoms. 

Next, we have supplied a sentence “He was killed by them. 
” to attempt to generate a rule for another target concept 
past_passive_voice(s, e). ANGEL could generate a new 
rule rl2 by employing a rule r 10 generated just now. 

past_passive_voice(s, e) <— noun.phrase (s, vj), 
past JBE(vi,V 2), 
past_participle(v2, V3), 
prepositional_phrase(v3, e) 

■••(’• 12 ) 

In this case, since an appropriate base rule, which does 
not exist initially, has occurred in knowledge base, a good 
rule is generated accurately by selecting it. ANGEL is capa- 
ble of growing knowledge base gradually by employing rules 
generated by ANGEL itself as base rules. 

Let us discuss the computational complexity of ANGEL. 
In order to evaluate the similarity between atoms, ANGEL 
has to compute deductive closures of each of the atoms. And 
the similarities between atoms in arbitrary correspondences 
have been estimated to find the most suitable pair of the 
atoms in the given example and the base rule. Therefore, 
procedure of ANGEL may be expensive as a whole, although 
hypothesis space to be considered is small. In fact, as a 
result of implementing ANGEL on Sun SPARC Station2 with 
SICStus Prolog, it took a few minutes to generate a English 
syntax rule. 

The approach evaluating similarities between atoms based 
on their deductive closures is theoretically interesting, but it 
may not be practical. For the purpose of practical learning, 
some restrictions on either forms of the background knowl- 
edge or the hypothesis language are required like Muggleton’s 
GOLEM[Muggleton 1990]. We think we will have to improve 
the practicability of ANGEL in the near future. 

4 Related works 

In this section, we characterize ANGEL from a viewpoint of 
general machine learning framework. 

ANGEL belongs to the category of learning from exam- 
ples, in the sense that it generates new rules by generalizing 
given examples. In inductive learning methods, generally, 
pre-defined generalization rules are used for generalizing ex- 
amples. ANGEL also uses three kinds of generalization rules 
corresponding to dropping condition rule, turning constants 
to variables rule and constructive generalization rule based 
on logical implications [Michalski 1983], all of them are con- 
sidered as the primary generalization rules in learning from 
examples. However, ANGEL differs from the ordinary in- 
ductive learning methods in using the existing rules as the 
bias. That is, ordinary inductive learning uses no existing 
rules, even if so, it uses them for the constructive induction. 
On the other hand, ANGEL employs the similarity between 



the existing rules and the given example in order to drop 
conditions, so it can reduce the hypothesis space extremely. 

ANGEL is related to inductive logic programming (ILP), 
because it generates rules represented as Horn clauses by 
induction. ILP is also capable of learning new rules with 
reference to existing rules. Both Muggleton and Bun- 
tine’s CIGOL[Muggleton and Buntine 1988] and Wirth’s 
LFP2[Wirth 1989], which are typical examples of ILP sys- 
tem, use operators based on inverting resolution to aug- 
ment incomplete clausal theories. The difference between 
these systems and ANGEL is the way of employing existing 
background knowledge. That is, in both of their systems, 
background knowledge is not employed as biases at all. In 
fact, rules can be acquired under no background knowledge. 
Therefore the interaction between user and system is in- 
evitable in their systems to derive reasonable rules. Whereas, 
ANGEL employs background knowledge as a bias. A given 
example is generalized through mapping a structure of a rule 
in existing knowledge base. It provides a strong restriction 
for induction and serves to generate a few useful new rules. 

ANGEL evaluates a similarity between existing rule and a 
given example to learn a new rule. Therefore it can also be 
regarded as a kind of method for learning by analogy. Davies 
and Russell [1987] have defined, in their paper, reasoning by 
analogy as the process of inferring that a property Q holds of 
a particular situation T (called the target) from the fact that 
T shares a property P with another situation S (called the 
source) that has property Q. In analogy, it is very important 
to match between the target and the source. Similarly, in 
ANGEL, the matching between existing rules and a given 
example, which is called correspondence in this paper, must 
be found successfully. Now we compare ANGEL with several 
methods with respect to the way of matching. 

Haraguchi and Arikawa [1986] have formalized the reason- 
ing by analogy on a deduction system. In their method, 
the domain for reasoning is represented by a set of definite 
clauses, and the similarity between objects is defined as the 
identity of predicates. Therefore the matching is performed 
by pairing the atoms which are described with the same pred- 
icate. On the other hand, ANGEL finds a correspondence 
between atoms based on their similarities, that is, it will not 
require identity of predicates. And it enables ANGEL to 
generate completely novel rules. 

Recently, Arima [1991] has analyzed analogy from the 
point of logical relevance. His formulation is based on the 
idea as follows. 

1. The property to be projected from the source to the 
target must be justified. 

2. The similarities, which means the properties shared by 
both the source and the target, should be formed by the 
minimum justifications. 

Unlike ANGEL, the shared properties must be represented 
by the same predicates both with the source and with the 
target. 

Gentner [1983] has also developed a method, called Struc- 
ture Mapping, for the matching between the target and the 
source. In her method, first an atom is matched with an- 
other atom, when both of them are described with the same 
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predicates, and next, the object in each atom is matched. 
And the process of the matching is repeated based on newly 
matched objects. ANGEL is similar to Structure Mapping, 
because the matching between atoms is achieved based on 
the matched objects. However, there are the following two 
differences between them. 

1. Although Structure Mapping requires the identity to 
several kinds of predicates (e.g. greater, cause, etc.) 
in order to match between atoms, ANGEL will not re- 
quire the identity of predicates at all. 

2. In Structure Mapping, the similarity between descrip- 
tions is defined by the identification of predicates and 
the number of matched descriptions. On the other hand, 
in ANGEL, it is defined as the subsumption between 
deductive closures of atoms based on the logical consid- 
eration. 

ANGEL is also related to both the explanation-based 
learning (EBL) [Mitchell et al. 1986] and Russell’s single- 
instance generalization (SIG) [Russell 1987], because all of 
them are capable of learning from one example and back- 
ground knowledge. However, EBL has to need completeness 
for background knowledge, so rules produced by EBL are lim- 
ited to ones which are deducible from background knowledge. 
In this sense, EBL cannot generate really new rules. SIG re- 
quires weak background knowledge, called determinations, 
in stead of complete one. That is, it can learn rules under 
comparatively insufficient background knowledge in contrast 
to EBL. Properly new rules cannot, however, be generated, 
because it does not deal with non-deductive reasoning. 

5 Conclusion 

This paper has described an approach to learning from an 
example by analogical generalization. 

The notable features of ANGEL are shown as follows. 

1. ANGEL is able to generate a new rule from a given single 
example by analogical generalization. 

2. A similarity between an existing rule and an example 
can be evaluated a similarity between atoms forming 
each of them. 

3. A similarity between atoms is defined based on the sub- 
sumption relation between deductive closures of atoms, 
and it enables to compute similarities formally. 

Through the experiment for the domain of parsing English 
sentences, we have confirmed that ANGEL is useful for ac- 
quiring knowledge on knowledge based systems. 

In this paper, from the inductive learning point of view, 
we have highlighted the method to generate a new rule from 
a given example. The definition of similarity introduced here 
is not specific for inductive learning. We plan to apply this 
idea to other various reasoning paradigms (e.g. ordinary ana- 
logical reasoning, deductive reasoning and so on) to improve 
performance and applicability of them. 

This work was supported partly by the Grant-in- Aid for 
scientific research from the Ministry of Education. 
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Abstract: This paper treats a general type of analog- 
ical reasoning which is described as follows: when two 
objects, B (the&ase) and T (the target ), share a prop- 
erty S (the similarity ), it is conjectured that T satisfies 
another property P (the projected property) which B sat- 
isfies as well. 

Through a formal analysis of this type of analogy, a 
logical relation is explored which is necessarily satisfied 
by the tuple, T, B , A, P, under an axiom, A. Unlike pre- 
vious studies on analogy, this work does not give any 
particular assumption a priori to the tuple. 

By the analysis, it is shown to be reasonable that ana- 
logical reasoning is possible only if a certain form of rule, 
called the analogy prime rule , is a deductive theorem of 
a given theory, and that, from the rule, together with 
two particular conjectures, an analogical conclusion is 
derived. Also, a candidate is shown for a non-deductive 
inference system which can yield both conjectures. 

1 Introduction 

When we explain a process of reasoning by analogy, we 
may say, “An object T is similar to another object B 
in that T shares a property S with B and B satis- 
fies another property P. Therefore, T also satisfies P r> . 
We may express this more formally using the following 
schema. 

S(B) A P{B) 

S(T ) 

P(T) 

Here, T will be called the target , B the base., S the sim- 
ilarity between T and B , and P the projected property. 

•The above description of the process of analogy is, 
however, insufficient. Researchers studying analogy have 
come to recognize the necessity of revealing some implicit 
condition which influences the process but does not ap- 
pear in the above schema. The importance of this has 
already been discussed enough in [3], The implicit con- 
dition to be satisfied by appropriate analogical factors, 



T. B, S. and P. can, formally, be characterized only by 
a given theory (axiom), written as A. The objective of 
this paper is to explore the particular relation of analogy 
which T . B , S\ P and A necessarily satisfy. 

In the study of analogy, the following have been central 
problems: 

1) what object should be selected as a base w.r.t a tar- 

get, 

2 ) which property is significant in analogy among prop- 

erties shared by two objects, and 

3) what property is to be projected w.r.t. a certain sim- 

ilarity. 

Many significant works have been vigorously conducted 
on these problems, though they were only partially suc- 
cessful in answering these questions, that is, by giving in- 
tuitive and strong assumptions a priori. In many works, 
a base case was assumed to be given w.r.t. a target case 
[4. 11, 10]. In almost all works, the important similar- 
ity (or similarity measure) was defined a priori indepen- 
dently of what property was projected [20, 6, 10, 7, 5]. 
In logical works [8, 5], especially in [3], nice logical rela- 
tions among the analogical factors could be seen, though 
they, like others, were given without sufficient examina- 
tions which would show why and how their relations were 
necessary. 

Unlike previous studies on analogy, this work does not 
give any particular assumption a priori to the analogical 
factors. Clarifying the relation between the factors, T. 
B , 5, P and A , will be enough to answer the above 
three problems once and for all. The relation shown by 
this paper is a general solution for them and might show 
how useful a formal treatment is in analyzing analogical 
behavior. 

First, through a logical analysis of analogy, it is shown 
to be reasonable that, when an analogical inference is 
done under a theory A, a particular form of rule must 
be a logical conclusion (a theorem) of A and that ana- 
logical inference is accomplished by two particular types 
of (generally non-deductive) conjectures. Then, a non- 
deductive inference is proposed, which is shown to be an 
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adequate candidate to yield the conclusions of both these 
conjectures. 

2 A Logical Analysis 

2.1 Preparations 

In this paper, we use standard formal logic and notations, 
while defining the following. An n-ary predicate U is 
generally expressed by A xQ, where a; is a tuple of n object 
variables, Q is a formula in which no object variables 
except variables in x occur free. If t is a tuple of n terms, 
U(t ) stands for the result of replacing each occurrence of 
(elements of) x in Q with (each corresponding element 
of) t simultaneously. For any formulas A and P. when 
A h F and 1/ F (that is, F is not valid), we say F is a 
genuine theorem of A and express it simply as A | — F. 

We will use a closed formula of first order logic A for a 
theory , (generally n) terms T for a target and (generally 
n) terms B for a base. A property is expressed by a pred- 
icate, for instance, a similarity and a projected property 
are expressed by predicates, 5” and P respectively. 

2.2 Approach To A Seed of Analogy 

We can understand analogical reasoning as follows: 

(1) Example- based Information: 

“An object, x' (corresponding to a base), satisfies 
both properties S and P (3 x'.{S(x') A P(x'))).” 

(2) Similarity-based Information: “Another object, 
x (corresponding to a target), satisfies a shared 
property S with x' (5(;r)).” 

(3) Analogical Conclusion: “The object x would sat- 
isfy the other property P (P(x))." 

Then, 

“Analogical reasoning is to reason (3) from A 
together with (l)+(2).” (A) 

Let this understanding be our starting point of analy- 
sis. 

As analogy is not, generally, deductive, this starting 
point may, unfortunately, be expressed only as follows. 
In the notation of proof theory, 

A,3x'.[S{x') A P(V)),S(a;) 1/ P(x). (1) 

As analogy, however, infers P(x) from the premises, it 
implies that some knowledge is assumed in the premise 
part of (1). Let the assumed knowledge be F(x), provid- 
ing that it depends on the x in general. That is, 

✓4,3x'.(5(a:') A P(x')),S{x),F(x) h P{x). (2) 



Thus, the essential information newly obtained by anal- 
ogy is F(x) in the above rather than the explicit pro- 
jected property P . Making J(x) stand for the con- 
junction of the example-based information and F(x). the 
above meta-sentence is transformed equivalently to 

A b Vx.(J[x) A S(x) D P(x)), (3) 

because A is closed. This implies that a rule must be 
a theorem of A and that the rule concludes any object 
which satisfies J(x) to satisfy P when it satisfies S. Once 
J is satisfied, (by reason of (S(x) D P(x)),) the analog- 
ical conclusion (“an object satisfies P”) can be deduced 
from the similarity- based information (“the object sat- 
isfies S). For this reason, this rule will be called the 
analogy prime rule (it will be specified in more detail 
later), J will be called the analogy justification. 

Moreover, it is improbable that the analogy prime rule 
is a valid formula, because, if so, any pair of predicates 
can be an analogical pair of a similarity and a projected 
property independently of A. Thus, the analogical prime 
rule must be a genuine theorem of A, 

A |— Vx.[J[x) A S(x) D P(x)). (4) 

Consequently, an object T which satisfies S is concluded 
to satisfy P from an analogy prime rule by analogical 
reasoning that assumes that T satisfies the analogy jus- 
tification (J[T)). That is, our starting point (A) can be 
specified from two aspects. 

“An analogical conclusion can be obtained from 
an analogy prime rule together with example- 
based information and similarity-based informa- 
tion.” (B) 

“A non-deductive jump by analogy, if it occurs, 
is to assume that the analogy justification of the 
prime rule is satisfied.” (C) 

In the following part of this paper, the analogy jus- 
tification and non-deductivity will be further explored. 
Before beginning an abstract discussion, it may be use- 
ful to see concrete examples of analogical reasoning. The 
next section introduces “target” examples of analogical 
reasoning to be clarified here. 

2.3 Examples 

Example!.: Determination Rule[3]. “Bob’s car 

[C Bob) and Sue’s car ( Cs U e ) share the property of being 
1982 Mustangs [Mustang). We infer that Bob’s car is 
worth about $3500 just because Sue’s car is worth about 
$3500. (We could not, however, infer that Bob’s car is 
painted red just because Sue’s car is painted red.)” 
Example-based Information: 

Model(Csuei M ustang) A V alue(Csue , $3500), (5) 
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Similarity-based Information: 

Model(C Bob, Mustang), (6) 

Example2: Brutus and Tacitus [1]. “ Brutus feels 

pain when he is cut or burnt. Also, Tacitus feels pain 
when he is cut. Therefore, if Tacitus is burnt, he will 
feel pain.” 

Example-based Information: 

{Suf fer( Brutus, Cut) D Feel Pain( Brutus )) (7) 

A (Suf fer(Brutus, Burn) D Feel Pain( Brutus)) (8) 

Similarity- based Information: 

Suf fer [Tacitus, Cut) D F eel P ain(T acitus) (9) 

ExampleS: Negligent Student 1 . " When I discov- 
ered that one of the newcomers ( Studentj ) to our lab- 
oratory was a member of an orchestra club (Orch), re- 
membering that another student ( Student s) was a mem- 
ber of the same club and he was often negligent of study 
(Study), I guessed that the newcomer would be negligent 
of study, too.” 

Example- based Information: 

Member jo f (Students, Orch) 

\Negligent-of (Students, Study) (10) 

Similarity-based Information: 

M ember -ofiStudentr, Orch) (11) 

2.4 Logical Analysis: a rule as a seed 
of analogy 

In treating analogy in a formal system, as the informa- 
tion of a base object being S and P is projected into 
a target object, it is desirable to treat such properties 
as objects so that we can avoid the use of second or- 
der language. As an example, the fact that Bob’s car is 
a Mustang is represented by l ' i Model(Csob, Mustang)" 
rather than simply as “Mustang(Csob) v ■ In the remain- 
ing part, we rewrite S(x) to E(x, S) and P(x) to II(;r, P). 
E will be called a similar attribute , II will be a projected 
attribute, 5 as an object will be a similar attribute value , 
and P as an object will be a projected attribute value. 
Then, (4) is rewritten 

A x , s, p.(J (x , s, p) A E(x,s) D II(a:,p)), (12) 

considering the most general case that the analogy jus- 
tification J depends on all of these factors. 

Again, when 3-tuple < object: X, similar attribute 
value: S , projected attribute value: P > satisfies the 
analogy justification J, object X is conjectured to sat- 
isfy the projected property Ax.II(a:,P) (analogical con- 
clusion) just because X has the similarity XxX (x,S). 

1 The author thanks Satoshi Sato (Hokuriku Univ.) for showing 
this challenging example. 



That is, J(x,s,p) can be considered a condition, where 
x could be concluded to be p from x being s by analogical 
reasoning. 

Now, recalling that an analogical conclusion is ob- 
tained from the analogy prime rule with example-based 
information and similarity- based information, consider 
what information can be added by the information in 
relation to the analogy prime rule. 

1) Example-based Information: This shows that 

there exists an object as a base which satisfies a 
similarity and a projected property ( 3;r / .(E(.r / . 5) A 
II( x\ P)) ). It seems to be adequate that the base, 
B. satisfying S (x',S) can also be derived to sat- 
isfy n(a:', P) from the prime rule, because B can be 
considered a target which has similarity S. That is, 
3-tuple < B,S,P > satisfies the analogy justifica- 
tion. Consequently, from arbitrariness in selection 
of an object as a base in this information, what is 
obtained from this information is 3a:'. J(x\ S,P). 

2) Similarity-based Information: This shows that 
an object as a target, T, satisfies the same prop- 
erty S in the above. Just by this fact, an analogical 
conclusion is obtained, by assuming that the object 
satisfies J by some conjecture. That is, there ex- 
ists some attribute value p 1 and 3-tuple < T , S, p ' > 
satisfies J (3 p'. J(T,S,p')). 

3) Analogical Conclusion: With the above two 

pieces of information, an analogical conclusion. “T 
satisfies II(x,F)”, is obtained from the analogy 
prime rule. Therefore, such 3-tuple < T, S, P > 
satisfies J ( J(T,S,P) ). 

In the above discussion, T, 5, and P are arbitrary. 
Therefore, the following relation about the analogy jus- 
tification turns out to be true: 

Va\ s,p.( 3 x'.J(x',s,p) A 3 p' ,J{x,s,p') 

D J(x,s,p) ). (13) 

(13) is able to represent it equivalently as follows: 

J(x,s,p) = J a tt{s,p) A J 0 bj{x,s), (14) 

where both J aU and J 0 hj are predicates, that is, each of 
them has no free variables other than its arguments. 

The point shown by this result is that any analogy 
justification can be represented by a conjunction in which 
variable x and variable p occur separately in different 
conjunct s. 

By (12) and (14), the analogical prime rule can be 
defined as follows. 

Definition 1 Analogy Prime Rule 

A rule is called an analogy prime rule w.r.t. 

< E(;r, s); 11 ( 2 , p) >, if it has the following form: 
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V X,S,p.(Jatt(S,p ) A J 0 bj (x. s) A S(x,s) D II(x.p)), (15) 

where J aU ■ Jobj • E and IT are predicates. (That is. each of 
Jatt{s,p)> Jobj{x<$)i S(x, s) and Il(x,p) is a formula in 
which no variable other than its arguments occurs free.) 
□ 

In (15), J a tt( s 'P) w iH be called the attribute justifica- 
tion and J obj (x,s) will be called the object justification. 

Also, by the above discussion, the following two con- 
jectures can be considered as causes which make analogy 
non-deductive. 

• Example-based Conjecture (EC): An object 

shows a existing concrete combination of a similar- 
ity and a projected property. This specializes the 
prime rule and allows it to be applicable to a simi- 
lar object. Assuming some generally non-deductive 
inference system under A , (we will propose 

such a system later), 

3x.(E(x,5) A II(x, P)) JattfS.P). (16) 

• Similarity-based Conjecture (SC): Just be- 

cause an object satisfies S , application of the spe- 
cialized prime rule to the object is allowed. 

S(x,5) ^ J obj (x,S). (17) 

In case that the attribution justification (J a tt(s,p)) 
is a valid formula, example-based information becomes 
unnecessary in yielding analogical conclusion. Thus, it 
could, in general, be essential in analogical reasoning to 
guess J a tt{SiP) which is not a valid formula. The ob- 
ject justification ( J 0 bj(x 1 s)) is, still, important in another 
sense, because it can be considered to express a really sig- 
nificant similarity. It is not an unusual case when a really 
significant similarity is not observable. Consider a case 
of Example 2. Having a nervous system will be a suffi- 
cient condition for an object to feel pain, thus, whether 
an object has a nervous system is a significant factor in 
making a conjecture on feeling pain. In this case, how- 
ever, we could, without dissection, not obtain a direct 
evidence which shows that Tacitus and Btutus have ner- 
vous systems, while we obtain only a circumstantial evi- 
dence that the both feel pain when they are cut. Thus, 
the similarity- based conjecture is to guess such a really 
significant but implicit similarity, the object justification 
( Jobj ( £ , s) ) , from an observed similarity E(x,s). 

To summarize, a logical analysis of analogy could draw 
conclusions as follows. 

Analogical reasoning is possible only if a certain ana- 
logical prime rule is a genuine theorem of a given theory 



and the process of analogical reasoning can be divided 
into the following 3 steps: 1) the attribute justification 
part of the rule is satisfied by EC from example- based in- 
formation, 2) the object justification part of the rule is 
satisfied by SC from similarity-based information, and, 
3) fi'om similarity-based information and the analogy 
prime rule specialized by the two preceding steps, an 
analogical conclusion is obtained by deduction. 

A question remains unclear, that is, what inference 
is EC and what SC? Though we cannot identify the 
mechanism underlying each of the conjectures, we can 
propose a (generally) non-deductive inference system as 
their candidates. The next section shows this. 

3 Non-deductive Inference for 
Analogy 

This section explores a type of generally non-deductive 
inference by which a conjecture G is obtained from a 
given theory A with additional information K . 

Generally speaking, what properties should be satis- 
fied by a. generally, non-deductive inference? It might 
be desirable that a non-deductive inference satisfies at 
least the following conditions. First, it should subsume 
deduction, that is, any deductive theorem is one of its 
theorems, because any deductive conclusion would be 
desirable. Secondly, any conclusion obtained by it must 
be able to be used deductively, that is, from such a con- 
clusion, it should be possible to yield more conclusions 
using, at least, deduction. And, thirdly, any conclusion 
obtained must be consistent with given information. We 
define a class of inference systems which satisfy the above 
three conditions. 

Definition 2 An inference system under a theory A 
( written |~' 4 ) is deductively expansible if the following 
conditions are satisfied. For any set of sentences A and 
I\ and any sentences G and H , 

i) Subsuming deduction: 

if A. K h G then I\ j^ 4 G. 

ii) Deductive usefulness: 

if K K 1 G and A, K, G H . then K ^ A H . 

iii) Consistency: 

if K |~' 4 G and A U K is consistent , then 

A U A' U {G} is consistent. 

The following inference system is an example of a de- 
ductively expansible system. 
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Definition 3 G is a conjecture from A based on K by 
( atomic ) circumstantial reasoning (written K G) 2 . 

iff 

i) A, K h G, or 

ii) A,E\~G 

if there exists a minimal set of atomic formulas 3 E 
s.t. A,E h K, and A U E is consistent if 
A U K is consistent* . 

Proposition 1 

If K \^ A G and K,G K* H, then K b- 4 H. 
Corollary 1 If K G. then K G. 

Corollary 1 shows that, circumstantial reasoning is de- 
ductively expansible, and proposition 1 (together with 
the corollary) shows that inference done by multiple ap- 
plications of circumstantial reasoning is also deductively 
expansible. 

Circumstantial reasoning ( K G ) implies a very 

general and useful inference class in that so many types 
of inference used in AI can be considered as circumstan- 
tial reasoning. Deduction and abduction, for example, 
are obviously circumstantial reasoning. Moreover, if we 
loosen the condition “atomic formulas” to “clauses”, in- 
ductive learning from examples is the case where A is 
empty in general, K is “examples” and G is inductive 
knowledge obtained by “learning” 5 6 

Now, we assume that both EC and SC are circumstan- 
tial reasoning, but based on different information. Then, 
we can see analogical reasoning in more detail. 

Let an analogy prime rule w.r.t. < E(x, s); II(x,p) > 
be a theorem of A. Then, when example-based informa- 
tion, E (B,S) A II (B,P), is introduced, by circumstan- 
tial reasoning from the prime rule, some justifications are 
satisfied, that is, 

£(J9, 5) A 11(2?, P) K 4 Jatt(SyP) A J 0 i>j(B , S). (18) 

which concludes a specialized prime rule, 

2 Circumstantial reasoning is essentially equivalent to "abduc- 
tion” + deduction [13, 15]. However, “abduction” has many defi- 
nitions and various usages in different contexts, so we like to intro- 
duce a new term for the type of inference in Definition 3 to avoid 
confusion. 

3 Atoms, that is, formulas which contain only one predicate 
symbol. 

4 If there exists such a minimal set of atomic formulas E, the 
case ii) involves the case i) apparently. Thus, the case i) can often 
be neglected in a usual application, for instance, if K is a universal 
formula which has the form 'ix.F(x), where F is quantifier-free. 
Note that a clause is universal. 

5 In this case, G = E in Definition 3, which implies that G is a 
minimal set to explain “example” K. Indeed, such minimality is 
very common in this field. 

6 Such a unified aspect of various reasoning in AI was pointed 
out by Koich Furukawa (ICOT) in a private discussion and a sim- 
ilar and more intuitive view can be seen in [5]. 



'ix.(J 0 bj(x,S) A £(x,5) D II(x,P)). (19) 

Even if similarity- based information £(T, S) is intro- 
duced. to obtain analogical conclusion I1(T, P) by cir- 
cumstantial reasoning, some information apart from the 
prime rule turns out to be needed in A. And, both EC 
and SC are generally needed to accomplish analogical 
reasoning, which implies that multiple application of cir- 
cumstantial reasoning is necessary. Even in such a case, 
circumstantial reasoning remains worthwhile (Proposi- 
tion 1). 

4 Classification of Analogy and 
Examples 

Each EC and SC has two cases; a deductive one and 
a non-deductive one. According to this measure, ana- 
logical inference can be divided into 4 types. A typical 
example is shown in each class and explored. 

4.1 deductive EC + deductive SC 

Typical reasoning of this type was proposed by T. Davies 
and S. Russell [3]. They insisted that, to justify an ana- 
logical conclusion and to use information of the base case, 
a type of rule, called a determination rule , should be a 
theorem of a given theory. The rule can be written as 
follows; 

Vs.p.( 3x'.(£(V, s) A n(x',p)) 

D Vx.(S(x,s) D U(x.p)) ) (20) 

Example 1 (continued). In this example, the follow- 
ing determination rule is assumed to hold under A. 

Vs,p.( 3x'.(Model(x\ s) A Vaiue(x' ,p)) 

J Vx.(M odel(x,s) D Value(x.p)) ) (21) 

This rule is an analogy prime rule, because 

J ob j(x,s) = E(x,s) = Model(x.s), 

Jatt{ s -P) = (3x. Model(x,s) A Value(x.p )), 

II (x,p) = Value(x.p). 

Moreover. 

EC: 

M odel(C Sue , M ustang) A V'a/ue(Cs ue , $3500) 

I- J at t( Mustang, $3500), (22) 

SC: 

M odel(C'Bob- Mustang) h J 0 bj(C Bob< Mustang). (23) 

This illustrates that reasoning based on determination 
rules belongs to the “deductive EC + deductive SC” type 
and that it can also be done by circumstantial reasoning. 
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4.2 deductive EC -f- non-deductive 
SC 

This type of analogical reasoning was explored by the au- 
thor [1], It was concluded that, once we assumed the fol- 
lowing two premises for analogical reasoning, it seemed 
to be an inevitable conclusion that analogical reasoning 
which infers P(T) from S(T), S(B), and P{B) satisfies 
the illustrative criterion. And if an inference system sat- 
isfies the criterion, the system is called an illustrative 
analogy. 

Premise 1: “Analogy is done by projecting properties 
(satisfied by a base) from the base onto a target.” 

Premise 2: “The target is not a special object.” 

Premise 2 is also assumed in this paper, it is translated 
into an arbitrary selection of a target object. Premise 
1 was translated as follows: J(B), (where J is the jus- 
tification in (4) and B stands for a base object) must 
be a theorem of A, because it is essential in analogical 
reasoning to project J{B) onto a target object T. That 
is, the non-deductive part in this reasoning is just SC 
which conjectures the property of the target object, and 
EC must be deductive. 

Example 2 (continued). By illustrative analogy, a 
target is conjectured to satisfy properties used in an 
explanation of why a base satisfies a similarity. In 
this example, to explain the phenomena of the base 
case, “Brutus feels pain when he is cut or burnt”, the 
following sentences must be in A. 

Vx,i.( NervousSys(x) A Destructive(i) A Suffer(x,i) 

D FeelPain(x ) ), (24) 

A NervousSys(Brutus) (25) 

A Destructive(C ut) A Destructive(Bum) (26) 

From (24), the following follows: 

\/x,s,p.( NervousSys(x) 

ADestructive(s) A Destructive (p) 

A (Suffer(x,s) D FeelPain(x)) 

D ( Suffer(x,p ) D FeelPain(x)) ), (27) 

which is an analogy prime rule, that is, 

J ob j{x,s ) = NervousSys(x), 

Jatt{s,p) = Destructive(s) A Destructzve(p), 

E(x,s) = Suf fer(x,s) D FeelPain(x), 

II(a:,p) = Suf f er(x ,p) D FeelPain(x). 

J aU (Cut, Burn) (“Both cut and burn are destruc- 
tive” ) is a deductive theorem of A and a non-deductive 
conjecture, J ob j(Tacitus,Cut) (“Tacitus has a ner- 
vous system”), is obtained by circumstantial reasoning 
from (24) based on the similarity-based information, 
Suffer(Tacitus.Cut) D FeelPain(Tacitus). 



4.3 non-deductive EC + deductive 
SC 

As far as the author knows, this type of analogy has never 
been discussed. Example 3 seems to show this type of 
analogy. 

Example 3 (continued). First, let us consider what 
we know from example-based information in this case. 
From the fact that a student ( Students ) was a mem- 
ber of the same club ( Orch ) and often neglected study 
(Study), we could find that “the orchestra club keeps 
its members very busy (BusyClub(Orch)y and that 
“activities of the club are obstructive to one’s study 
(Obstructive -to(Orch, Study))”. This implies that we 
knew some causal rule like “If it is a busy club and its 
activities are obstructive to something, then any member 
of the club neglects the thing.” 

Va:, s,p.( BusyClub(s) A Obstructive-to(p,s) 

AM ember -of(x,s) 

D Negligent.of(x,p) ) (28) 

Using this rule, we found the above information. 

Thus, the above rule is assumed to be a theorem of 
A. BusyClub(Orch) and ObstructiveJo(Orch, Study) 
are non-deductive conjectures and it can be obtained by 
circumstantial reasoning based on the above rule which 
is just an analogy prime rule, as follows: 

J ob j(x,s) = E(x,s) = M ember -of(x,s), 

Jatt(s-,p) = BusyClub(s) A ObstructiveJo(p, s), 

II(a:,p) = Negligent-of(x,p). 

4.4 non-deductive EC -f non- 
deductive SC 

As an example of this type, we can take Example 2 again. 
We might know neither “Brutus has a nervous system” 
nor “Both cut and burn are destructive”, which corre- 
sponds to the case that (25) and (26) are not in A (nor 
any deductive theorem of A) in the previous Example 2. 
However, by circumstantial reasoning from (24) based on 
example-based information (“Brutus feels pain when he 
is cut or burnt”), “Both cut and burn are destructive” 
(and "Brutus has a nervous system”) can be obtained, 
and based on similarity- based information (“Tacitus feels 
pain when he is cut”), “Tacitus has a nervous system”, a 
really significant but implicit similarity, is obtained sim- 
ilarly to the previous example. Consequently, the ana- 
logical conclusion (“Tacitus would feel pain when he is 
burnt”) is derived from (27) (or (24)) together With the 
above conjectures. 
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5 Conclusion and Remarks 

• Through a logical analysis of analogy, it is shown 
to be reasonable that analogical reasoning is pos- 
sible only if a certain analogy prime rule is a de- 
ductive theorem of a given theory. From the rule, 
together with an example-based conjecture and a 
similarity-based conjecture , the analogical conclusion 
is derived. A candidate is shown for a non-deductive 
inference system which adequately yields both con- 
jectures. 

• Results shown here are general and do not depend 
on particular pragmatic languages like the purpose 
predicate [10] nor on some numeric similarity mea- 
sure [20]. These results can be applied to any normal 
deductive data bases (DDB) which consist of logical 
sentences. 

Application of this analogical reasoning to DDB 
may be one of the most fruitful. It is. generally 
speaking, very difficult to build a DDB which in- 
volves perfect knowledge about an item. Analogi- 
cal reasoning will increase the chance of answering 
queries adequately, even when its deductive opera- 
tion fails to answer. In a DDB, it is very common 
to see inheritance rules and transitivity (-like) rules, 
which have the form of the analogy prime rule, for 
instance, 

Gran_pa(x, y) : — Parent(x,z),Parent(z,y). (29) 

This is an analogy prime rule w.r.t. < 

Parent(z, y); Gran_pa(x,y) > (z is a variable for the 
similar attribute value and x is a variable for the 
projected attribute value). Assume that a query 
“?-Gran_pa(x,Tom)” is given to a database A which 
involves the above rule and the following facts: 

Parent(Sue,Tom). (30) 

Gran_pa( John, Bob). (31) 

Parent(Sue,Bob). (32) 

The database cannot answer the query deductively, 
because it does not know who is a parent of Sue. 
If the database uses the proposed type of analogi- 
cal reasoning, it is able to guess Gran_pa( John, Tom) 
from Bob’s case just because Tom is similar to Bob in 
that their parents is the same. 

Interestingly, a method which discovers an analogy 
prime rule from knowledge data-base CYC is ex- 
plored independently [17]. Such methods make ana- 
logical reasoning more common in DDB. 

• By the side effect of this analysis, it becomes 
possible to compare analogy with other reason- 
ing formally which have been studied vigorously 



in the area of artificial intelligence. Analogi- 
cal reasoning differs from other reasoning, ab- 
ductive and deductive , in that analogical reason- 
ing actually uses example-based information (the 
base information). Consider the difference from, 
this time, abduction in the above database case. 
Even if the database uses (ordinal) abductive rea- 
soning in the query, it cannot specify an ade- 
quate grandparent of Tom, the possible answer 
will be x s.t. Gran_pa(x,Tom), Parent (x, Sue), 
(3z.)(Parent(x,z),Parent(z,Tom)), or Sue assum- 
ing Parent(Sue, Sue), etc [2, 14, 18, 9]. The reason 
for this failure is that abduction tries to explain only 
the target case. 

Moreover, comparing with enumerative induction 
and case-based reasoning (CBR) in which the use 
of examples are essential similarly to analogical rea- 
soning, analogical reasoning has a salient feature in 
more strongly depending on a background knowl- 
edge (a given theory). Analogy can be seen as a 
single instance generalization as Davies and Russell 
pointed out [3]. Take an example, Example 3. From 
the analogy prime rule (28) and example-based in- 
formation of an base case ( Student g), some non- 
deductive inference (ex. circumstantial reasoning) 
yields a more specified analogy prime rule, 

Sx.( M ember _o / (x, Or ch) 

D N egligent-of[x. Study) ), (33) 

which is a generalization of the example-based in- 
formation, 

M ember -o f{ Students, Or ch) 

NN egligent -of [Student g. Study). (34) 

We should note that, in the process of this single 
instance generalization, an analogy prime rule in a 
background knowledge is used as an intermediary, 
and it might be considered the reason why analogy 
seems more plausible than a simple single instance 
generalization such that it yields (33) just from (34). 

In the research of formal inductive inference [16, 12], 
a back ground knowledge does not play such an im- 
portant role. So, plenty of examples are needed un- 
til a plausible conclusion is obtained. Concerning 
CBR [19], though it uses base cases like analogi- 
cal reasoning and, in order to retrieve their base 
cases, it uses an index which corresponds to the 
similarity S, the index is assumed to be given in 
spite of using background knowledge. Intuitively 
speaking, these methods will be very useful when 
a background knowledge is rather poor or difficult 
to formulate, and when the background knowledge 
is extremely strong or able to be formulated per- 
fectly. deduction will be most useful, on the other 
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hand, the proposed type of analogy will be useful 
when rather strong and difficult to formulate. 

• An implementation system for this type of analogy 
has been developed. Given a theory A , a target 
T and a projected attribute II(a;,p) (from a query, 
“?- II(2»”), this system finds a base B , a simi- 
larity S(a:,5) and a projected property II(a:,P) (ie. 
“II(T, P)* is the answer of the query) by the process 
with backtracking, according to the following steps: 

1) Find a separate rule SepR s.t. A\~ SepR, 

where SepR = II(;r,p) G att (s,p), G 0 bj{x, s). 

2) Take a similar attribute E(a:,s) 

s.t. E(x,s) G 0 bj{x,s). 

3) Obtain the similar attribute value 5 

by the side effect of a proof A b 3s.Ei(T, s). 

4) Retrieve a base B and obtain the projected 

attribute value P 

by the side effect of a proof 
A b 3 x,p.(E(x,S) A fl(x,p)). 

Here, a separate rule ( w.r.t . n( ar , p) ) is a Horn clause 
in which the head is n(;r,p), and any variable of a- 
and any variable of p does not appear in the same 
conjunct in the body. This system guesses success- 
fully for the examples shown here, though each of 
them is translated into a set of Horn clauses. 

Significant restrictions are needed on the time com- 
plexity of this process. Details of this system will 
be reported elsewhere. 
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Appendix 

Proposition 1. 

If K [A G and A', G H? H, then K ^ H. 

Proof of Proposition 1. 

For any formula G, if K fA G and A, G |A H, we 
write K f~A H. 

i) Subsuming deduction: 

if A A b A then KY~*H. 

(proof) 

K [-A K. (from subsuming deduction of “|A” ) 
A A b H => K [A H. (from Definition 3 i)) 
Therefore, K Gm H . 

ii) Deductive usefulness: 

if A ^A H and A, A. II b L. then A Gi L. 
( proof ) 

A. A. H b L & A b A A H D L 

For any formula G s.t. A |A G and A . G f-f H . 

case-i) A. A. G b H (from A . G H ) 

From the premises, ,4. A. G b L. 

Therefore. A , G [A L. (from Definition 3 i ) ) 

case-ii) otherwise, for some minimal set of atomic 
formulas A s.t. A, E b K A G, 

A E b A A H. (from A, G (A H) 

Therefore. A, E b L. 

Thus, A.G’HfA 

Thus A-.GhJI. 

iii) Consistency: 

if A |A H and A U K is consistent, then 
A U K U { H) is consistent. 

(proof) 

A U A is consistent. 

=> A U A U {G} is consistent, (from A [A G) 
=> A U E is consistent, (from A. G |A H) 

=> .4 U A U { H }. (because A. E b A' A H) 

Corollary 1. 

If A |A G\ then A |A G. 

Proof of Corollary 1. 

K [A I\ (from subsuming deduction) 

If K |A A' and I\.K (A G', then A [A G. (from 
Proposition 1) 

Therefore. 

If K (-A G, then K [-A G. 
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Abstract 

If realistic systems are to be successfully modelled and 
diagnosed using model-based techniques, a more 
expressive language than classical logic is required. In 
this paper, we present a definition of diagnosis which 
allows the use of a nonmonotonic construct, negation as 
failure, in the modelling language. This definition is 
based on the generalised stable model semantics of 
abduction. 

Furthermore, we argue that, if negation as failure is per- 
mitted in the modelling language, the distinction 
between abductive and consistency-based diagnosis is 
no longer clear. Our definition allows both forms of 
diagnosis to be expressed in a single framework. It also 
allows a single inference procedure to perform abduc- 
tive or consistency-based diagnoses, as appropriate. 

1 Introduction 

Many different definitions of diagnosis have been used 
in an attempt to formalise and automate the diagnosis 
process. In the so-called ‘logical’ approach, two frame- 
works, namely the consistency-based [Reiter 1987] and 
abductive [Cox and Pietrzykowski 1986], have attracted 
a lot of attention. Typically, the modelling language 
used in these frameworks is first order logic (or some 
subset of it). In this paper we present a unified frame- 
work for diagnosis which brings together these two 
styles of diagnosis, as well as providing a non-monot- 
onic modelling language. 

We were primarily motivated by the need to incorporate 
negation as failure, the non-monotonic construct in 
logic programming, into the modelling language. We 
first show the need for this construct through some 
examples, and then argue that the incorporation of 
negation as failure in the modelling language necessi- 
tates the inclusion of both consistency-based and 
abductive diagnosis within the same framework. We 
then present our unified framework, which allows nega- 
tion as failure in the modelling language and naturally 
incorporates both abductive and consistency-based 
diagnosis. We then show that in the special cases, our 



approach reduces to pure consistency and pure abduc- 
tive diagnosis, i.e. it is a generalisation of both styles. 

Our work is similar in spirit to the work of Console and 
Torasso, [1990],[1991], but goes beyond it in many 
ways. We will compare our approach to that of Console 
and Torasso in a later section. Our proposed framework 
is based on the Generalised Stable Model semantics 
[Kakas and Mancarella 1990a] of generalised logic pro- 
grams with abduction, strengthening the link between 
logic programming and diagnosis first explored in [Esh- 
ghi 1990]. 

2 Consistency-based and abductive 
approaches to diagnosis 

In both consistency-based and abductive approaches, a 
set of axioms SD (called the system description) models 
the system under investigation, and a set of abnormality 
assumptions Ab={ab 1 ,ab 2 ,...ab n } represents the possible 
underlying causes of failure. A set of statements, Obs, 
represents observations of the behaviour of the system 
which are to be explained. 

In the consistency-based approach, a diagnosis is a set 
of abnormality assumptions, A, such that 

(1) SDuOBSuAu{ — >ab k l ab|<e Ab-A] is consistent. 

The consistency-based approach focuses primarily on a 
model of the system’s correct behaviour. When the 
abnormality assumptions relate to the failure of the 
components of the system, it attempts to find a set of 
normality and abnormality assumptions which can be 
assigned to the system’s components to give a theory 
consistent with the observations. 

In the abductive approach, a diagnosis is a set of abnor- 
mality assumptions, A, such that 

(2) SDuA I- OBS 
SDuA is consistent. 

The abductive approach primarily models the behaviour 
of a failing system, by using fault models in the system 
description, SD. The diagnosis process consists of look- 
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ing for a set of abnormality assumptions which, when 
adopted, will logically predict the observed faulty 
behaviour given the system description and the context 
of the observation. 

In both approaches, a diagnosis A is defined to be mini- 
mal if there is no other diagnosis. A', which is a proper 
subset of A. 

3 The Diagnosis Problem 

The system description used in model-based diagnosis 
takes one of two forms. It is either a causal model, or a 
model consisting of the system’s structure and the be- 
haviour of individual components. In general, work on 
abductive diagnosis has focused on the former, while 
work on consistency-based diagnosis has focused on the 
latter. 

For the purposes of this paper, we adopt a specification 
of a diagnosis problem based on those used in [deKleer 
and Williams 1987] and [Reiter 1987], which uses a 
component-based approach. However, the results hold 
equally for a causal model-based approach, and for this 
reason, we adopt slightly more general language in the 
definition. 

Definition: 

A diagnosis problem consists of a triple, <SD, OBS, C> 
where; ■ 

(i) The system description, SD, specifies the behaviour 
of the system. 

(ii) The observation set, OBS, specifies a set of observa- 
tions of the system as unit clauses. 

(iii) C consists of constants, ^Cj, which represent causal 
clusters within the system. 

Causal clusters are groups of causes of abnormal system 
behaviour which it makes sense to consider together. 
Each cause, n, within the cluster, q, is modelled in SD 
with two clauses; 

effects _of_cause_n <-ab(q, n). 

ab(q) <-ab(Cj, n). 

Furthermore, if so desired, we can define emergent prop- 
erties of the system which occur when none of the causes 



in cluster q are present, the ‘good behaviour model’ of 
this cluster, 

goodbehaviourjnodel <-not ab(q). 

In the component-based approach, q represents a com- 
ponent, and each cause in cluster q represents a possible 
fault model of the component. Note that the effects of a 
cause need not be defined deterministically. For exam- 
ple, the ‘arbitrary behaviour’ mode of a component, pro- 
posed in [deKleer and Williams 1989], is consistent with 
any behaviour of the component, but predicts nothing. 

The logical language adopted to represent SD can vary 
with the definition of diagnosis adopted. In this paper, 
we focus on two possible languages; classical logic, as 
adopted by Reiter [1987], and horn clauses with nega- 
tion as failure, as used in the logic programming com- 
munity. 

4 The need for negation as failure in the 
system description 

The desire to integrate consistency-based and abductive 
diagnosis was motivated primarily by the need to in- 
clude negation as failure in our models. The following 
two examples illustrate this need: 

RAM modelling 

In order to model the behaviour of a random access 
memory cell, we needed an axiom that says: the content 
of a cell at time T is X if X was written to this cell at time 
T\ and no other write operation has been performed be- 
tween T and T’. The most straightforward way of writing 
this is as the clause 

contents(Cell, X, T) <- written(Cell, X, T’), 

T’<T, 

not over-written(Cell,T’,T). 

over-written(Cel!,T’,T) written(Cell,X,T"), 

T’<T"<T. 

This is an instance of the ‘frame- problem’ being solved 
through negation-as-failure, as explored in [Shanahan 
1989], If we don’t use negation as failure, or some other 
non-monotonic device, we need to have axioms which 
allow us to derive -over-written(Cell,T’,T) for all cells and 
all time instants, which is very inefficient both in terms 
of speed of inference and storage required. 
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Pre-Charged Lines 

A common technique used in the computer industry to 
implement data buses is the pre-charged line. Devices 
communicate with one another using transmitters and 
receivers, all connected to a common line whose value 
floats to 1 when no transmitter is transmitting. (There are 
n lines for an n-bit wide data bus. Here we concentrate 
on one line). 

Physically, a value of 1 corresponds to high voltage, and 
a value of 0 to low voltage. In order to give the line its 
pre-charged value, it is connected to the positive power 
line by means of a pull-up resistor. Figure 1 gives a sche- 
matic of a typical pre-charged line. 

To transmit a 0, a transmitter on a line pulls the line to 
low. Since lines are pre-charged, transmitting a 1 does 
not involve any action by the transmitter. (Obviously, 
there is a bus protocol to determine which transmitter, if 
any, is transmitting at any given time. Here we ignore 
protocol issues.) 

The behaviour of pre-charged lines is best modelled by 
a default reasoning mechanism. The default value of a 
line is assumed to be 1 unless it can be proved to be 0. 
Using negation-as-failure, we could represent this as: 

received_value(Line,0) 4- driven_yalue(Line,0). 
received_value(Line,1) <- not diiven_value(Line,0). 
driven_value(Line,0) 4- connected(Line,output(X)), 
trasmits(X.O). 

The alternative, avoiding the use of negation-as-failure, 
would be to have an axiom such as: 

-,driven_value(Line,0) <~ 
VX(connected(output(X),Line)->-itransmits(X,0)). 
However, in order to prove VX(connected(output(X),- 
Line)-> -.transmits(X.O)), we would need closure axioms 
exhaustively enumerating all the transmitters on the 
line, which would be both cumbersome to write and 
inefficient to reason with. 

Full details of this modelling problem are given in [Esh- 
ghi and Preist 1992]. 

5 Negation As Failure blurs the distinction 
between abductive and consistency-based 
diagnosis 

Conceptually, the processes behind abductive and con- 
sistency-based diagnoses are quite different. In consist- 
ency-based diagnosis, one removes normality 
assumptions until the theory regains consistency. In 
abductive diagnosis, one adds abnormality assumptions 
until the specified bad observations are provable in the 
theory. 

However, by moving to a nonmonotonic theory, we can 
use the same process to perform both styles of diagnosis. 
We use negation as failure to represent the good behav- 
iour of a cluster as its default behaviour; 



behaviour 4- not ab(c) 

In a situation where the system is malfunctioning, and in 
the standard consistency-based approach we would de- 
rive an inconsistency by adding normality assumptions, 
we would get an inconsistency without adding any as- 
sumptions. This is because the negation as failure results 
in clusters defaulting to their ’good’ behaviour model. 
Furthermore, the theory can be restored to consistency 
by adding abnormality assumptions, as in abduction, 
rather than by removing normality assumption as in the 
standard consistency-based approach. 

It is exactly because of this effect that an abductive 
framework can be used to represent both consistency- 
based and abductive diagnoses. A similar approach to 
representing a component’s good behaviour as its de- 
fault behaviour was introduced in the context of the 
Nonmonotonic ATMS, in [Dressier 1990]. 

If we are to use negation as failure in the system descrip- 
tion, as we argued we need to do in many instances, it is 
necessary to integrate abductive and consistency-based 
approaches. This is because, in a logic with negation as 
failure, consistency-based and abductive diagnoses are 
the dual of each other. By passing through a negation, 
you pass from a consistency-based problem to an abduc- 
tive problem, or vice-versa. To see this, let us consider 
some simple examples; 

a) Consistency-Based diagnosis 
SD: obs <- not g 

g 4- ab(c) 

OBS: -.obs 

In a consistency-based diagnosis, we attempt to restore 
consistency by making assumptions so as to ‘not-prove’ 
a certain proposition which contradicts with the integ- 
rity constraints. In the case of the above example, we 
wish to not-prove obs. However, to do this, we must 
prove the negated goal, g. Hence we want an abductive 
diagnosis of the observation, g. 

b) Abductive diagnosis 
SD: obs 4- not g 

g <- ab(c) 

OBS: obs 

In an abductive diagnosis, we wish to make assump- 
tions so as to prove a certain proposition which is 
required to be true by the integrity constraints. In the 
above example, we wish to prove obs. However, to do 
this, we must fail to prove the negated goal, g. Hence, 
we want a consistency-based diagnosis for the observa- 
tion -q. 

Thus a diagnostic problem of one sort may have a diag- 
nostic problem of the other sort embedded in it. So, 
when the modelling language includes negation as fail- 
ure, abductive and consistency-based diagnosis cannot 
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be considered in isolation from each other. It is this that 
led us to formulate this integration. 

6 The Generalised Stable Model Semantics 
for Abduction 

Various semantics have been proposed for abduction, 
both formally and informally. Originally, an abductive 
explanation for an observation was informally defined 
as a set of assumables which, when added to a theory, al- 
lowed proof of the observation. This was then formal- 
ised to give a metalevel definition of abduction in [Esh- 
ghi and Kowalski 1989]. 

Console et al. [1990] have used the completion seman- 
tics to give a semantics to abduction in hom clause the- 
ories. Recently, they have extended it to cover hierarchi- 
cal logic programs [Console et al. 1991]. 

The semantics of abduction which we have chosen to 
use, however, is that provided by Kakas and Mancarella 
[1990a]. By extending the stable model semantics of 
logic programs [Gelfond and Lifschitz 1988], they give 
a semantics for abduction which holds for arbitrary gen- 
eral logic programs with integrity constraints. 

Here, we briefly recall their definitions; 

Definition 1 

An abductive framework is a triple <P,A,IC> where 

1) P is a set of clauses of the form H <- L b ..,L k kO 
where H is an atom and L[ is a literal. 

2) A is a set of predicate symbols, the abducible predi- 
cates. The abducibles, Ab, are then all ground atoms with 
predicate symbols in A. 

3) 1C, the integrity constraints, is a set of closed formu- 
lae. 

Hence an abductive framework extends a logic program 
to include integrity constraints and abducibles. The se- 
mantics of this framework is based on the stable model 
semantics for logic programs; 

Definition 2 

Let P be a logic program, and M a set of atoms from the 
Herbrand base. Define Pm to be the set of ground hom 
clauses formed by taking grdund(P), in clausal form, and 
deleting; 

(i) each clause that has a negative literal -.1 in its body, 
and 1 g M. 

(ii) all negative literals -.1 in the body of clauses, where 
1 <t M. 

M is a stable model for P if M is the minimal model of 
P M- 

This definition is extended to give a semantics to abduc- 
tive frameworks. 



Definition 3 

Let <P,A,IC> be an abductive framework, and A c atom- 
s(A) be a set of abducibles. Then the set M(A) of ground 
atoms is a generalised stable model (GSM) for <P,A,IC> 
iff it is a stable model for the logic program PuA, it is a 
model for the integrity constraints 1C, and A=AnM(A). 

The above definition is an extension of that in [Kakas 
and Mancarella 1990a] to allow abducibles to appear in 
the head of a clause. As a result of this, the set of abduc- 
ibles chosen as generators can be smaller than A, the set 
of abducibles true in the generalised stable model. 

A unit clause, q, representing an observation, has an ab- 
ductive explanation with hypothesis set A if there exists 
a generalised stable model, M(A), in which q is true. 

Equivalently, we can say that q has an abductive expla- 
nation, A, within the abductive framework <P,A,IC> if the 
abductive framework <P,A,IC+q> has a generalised sta- 
ble model M(A). Having q in the integrity constraints im- 
poses the condition that q must be true in the generalised 
stable model, and hence must follow from the logic pro- 
gram together with the set of abducibles chosen. 



7 Generalised Stable Models and Diagnosis 

The generalised stable model semantics for abduction 
can be applied to diagnosis by mapping a diagnosis 
problem, <SD, OBS, C>, with multiple observations, onto 
an abductive framework as follows; 

• Represent the system description, SD, as a logic 
program with integrity constraints, <P,IC>. The 
integrity constraints will usually contain sen- 
tences stating that observation points cannot 
take multiple values at a given time. 

• Let the abducibles represent the causes within 
the clusters, { ab(Cj.n )l CieC], hence A = 
{ab(X,N)}. 

Intuitively, given an observation set OBS, represented 
by a set of unit clauses, we have a choice of how to use 
it. We either wish to predict it, giving an abductive diag- 
nosis, or make assumptions to restore the theory to con- 
sistency, giving a consistency-based diagnosis. By 
adding OBS to the integrity constraints, only models in 
which the observations are true, and hence explained by 
the system description together with selected abduci- 
bles, are legal generalised stable models. Hence we get 
an abductive diagnosis. If, instead, we add OBS to the 
logic program representing the system description, then 
a set of assumptions can only be made if they are con- 
sistent with the observations; i.e. the observations, sys- 
tem description and assumptions cannot derive 
anything which violates the integrity constraints. This 
will give us consistency-based diagnoses. Furthermore, 
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we can partition OBS into two sets, and predict some 
observations, OBS p , while maintaining consistency with 
others, OBS c . We do this by placing OBS p in the integ- 
rity constraints, and OBS c in the logic program. 

This allows us to give a definition of unified diagnosis 
as follows; 

Definition 4 

Let <SD,OBS p ,OBS c ,C> be a diagnosis problem, where; 

SD is a logic program with integrity constraints, <P,IC>. 

OBS p is the set of observations to be predicted by diag- 
noses. 

OBS c is the set of observations which diagnoses need to 
be consistent with. 

C is the set of causal clusters in the system. 

Then; 

A is a GSM-diagnosis of <SD,OBS p ,OBS c ,C> iff there is 
a generalised stable model, M(A), of the abductive 
framework <PuOBS c ,A,ICuOBS p >. 

where A = {ab(C,N)} represents the set of possible root 
causes of misbehaviour in SD. 

To demonstrate this, we consider a simple example 
from the medical domain, that of pericardial tampon- 
ade. The heart consists of two parts, the myocardium is 
the muscle which beats, while the pericardium is the 
protective sac which surrounds this muscle. If this sac is 
pierced, instantaneous pain occurs, which can subside 
fairly quickly. However, blood slowly flows into the 
pericardium over a period of time, increasing the pres- 
sure on the myocardium. Later, the myocardium will 
become so compressed that blood does not flow round 
the arteries, even though the myocardium itself is func- 
tioning perfectly. 

The model of this phenomenon is given below. For sim- 
plicity, we treat time discretely, in units of hours. 

pulse_ok(T) <- normal_cardiac_contraction(T), 

not heart_compressed(T). 

no_pulse(T) <- heart_compressed(T). 

heart_compressed(T) <- ab(pericardium,pierced(T’)), 
T<T - 10. 

normal_cardiac_contraction(T) <- 

not ab(myocardium,failure(T)), 
T’<T. 

bad_ecg(T) <- ab(myocardium, failure(T)). 

We give the pericardium the possible failure cause 
‘pierced’ at a given time, while the myocardium simply 
suffers a ‘failure’ of some sort. The latter is consistent 
with any behaviour of the myocardium, but only pre- 



dicts a bad ecg trace. 

The above clauses form the logic program part of SD.In 
addition, we need the integrity constraints, 1C. These 
simply state which observations conflict with each 
other; 

-i(pulse_ok(T) & no jxilse(T)). 

-.(ecg_bad(T) & ec g q ood(T)). 

Assume we have the observation, no _pulse(12). Let us 
consider the generalised stable models of <P,A,IC>. 

If we place the observation in the logic program as a 
unit clause, any set of abducibles can be assumed as 
long as they do not violate the integrity constraints - i.e. 
they must not generate a stable model in which pul- 
se_ok(1 2) is true. If we assume nothing, the resulting 
stable model contains pulse_ok(12) as true, resulting in a 
conflict. There are two possible (minimal) ways to 
restore consistency. We can assume ab(myocardium,fail- 
ure(IO)) 1 , and cease to contain normal_cardiac_contrac- 
tion(1 2) in the stable model. Alternatively, we assume 
ab(pericardium,pierced(2)) 1 , which predicts heart com- 
pression at time 12. The resulting stable model will 
therefore not contain pulse_ok(12), and so be a legiti- 
mate generalised stable model of <Pu{no_pul- 
se(12)},A,IC>. 

If, instead, we place the observation in the integrity 
constraints, IC, we are restricted to stable models which 
contain no_pulse(12). In this case, only by assuming 
ab(pericardium,pierced(2)) do we generate a stable model 
which contains no_pulse(12). As this also satisfies IC, it 
is a legitimate GSM for <P,A,ICu{no_pulse(12)}>. 

Hence, by making a choice of where to place the obser- 
vation, we can generate either consistency-based or 
abductive diagnoses. Furthermore, if we have a second 
observation, ecg_good(12), we can choose to treat it in a 
different way from the first. Let OBS p = {no_pulse(l2)} 
and OBS c = (ecg_good(12)}. In this case, the only (mini- 
mal) GSM of <PuOBS c ,A,ICuOBS p > is that generated 
by ab(pericardium, pierced(2)). However, if we swap 
OBS p and OBS c , the only (minimal) GSM is that gener- 
ated by ab(myocardium, failure(IO)). 

Note how the model uses negation-as-failure to handle 
the frame problem. If we used classical negation 
instead, it would be necessary to have extra clauses to 
predict not_heart_compressed at all relevant times, 
resulting in a larger, less understandable, and less effi- 
cient model. 

8 Abductive and consistency-based 
diagnosis as special cases 

If we restrict our attention to the traditional definitions 
of diagnosis, we can show that our definition is equiva- 
lent to these under certain conditions. 

1 Or, of course, at any other appropriate time instant. 
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8.1 Abductive Diagnoses as Generalised 
Stable Models 

If all the observations are to be predicted in the abduc- 
tive sense, and the system description contains only 
horn clauses, our definition of diagnosis reduces to the 
standard definition of abduction given in section 1. This 
is achieved as follows: 

Given an abductive diagnosis problem <SD,OBS p ,C>, 
where SD is a horn-clause theory, divide the system 
description into a set of definite clauses, P, and a set of 
denials, D. Let A be the set of abducibles. 

It is easy to show that abductive diagnoses of SD 
according to formula (2) correspond to generalised sta- 
ble models of the framework <P,A,ICuOBS p >. 

8.2 Consistency-Based Diagnoses as 
Generalised Stable Models 

For a certain class of theories, namely almost-horn the- 
ories, we show that our definition of diagnosis is equiv- 
alent to the traditional definition of consistency-based 
diagnosis given in [Reiter 1987]. An almost-horn theory 
is a theory in which negation is used only to represent 
the negation of certain predicates. In the context of our 
theorem, these correspond to the abnormality assump- 
tions. 

Definition 5 

A clause is said to be almost-Horn with respect to A, if , 
when in disjunctive normal form, it contains at most 
one positive literal with a predicate symbol not in A. 

Theorem 

Let <SD, OBS c ,C> be a consistency-based diagnosis 
problem, with SD a theory which is almost-hom with 
respect to A={ab}. 

Then define the logic program with integrity con- 
straints, SD’=<P,IC>, as follows; 

Let aj g atoms(A), and p, qi e atoms(A). 

1. For every clause of the form 

p< — ,a 1 ,-ia 2 ...-.a kl a k+1 ,..,a m ,q 1 ,q 2 .-»qn in SD, there is a 
program clause 

p<- not a^not a 2 ...not a k ,a k+1 ,..,a m ,q1,q 2 ,..,q n in P. 

2. For every clause of the fdrm 

aiva 2 ...va k v — ia k+ -|V...v — >3^ — qlv — iq 2 v..v — iq n in SD there is 
an identical clause in 1C. 

Then; 

D is a consistency-based diagnosis of <SD, OBS c ,C> 
according to formula (1) 

D is a GSM-diagnosis of <SD’, 0, OBS c ,C> 

The proof of this theorem is available in an extended 
version of this paper, available from the authors. 

This theorem shows that, if negation is used only to rep- 



resent the normality assumptions in the system, -,ab, 
then the nonmonotonic definition of diagnosis given by 
us is equivalent to the monotonic definition given in 
[Reiter 1987]. However, if negation is used elsewhere 
in the theory, the two definitions diverge. The classical 
consistency-based definition requires explicit represen- 
tation of all negative information. The GSM-diagnosis, 
however, will make the closed-world assumption, and 
assume information is false unless it can be proved oth- 
erwise. 

9 Comparison with Console & Torasso [2] 

Console & Torasso have defined a framework for a gen- 
eral abduction problem. This framework allows a spec- 
trum of diagnosis styles to be represented within it, 
including the pure consistency-based and abductive 
styles described above. 

They divide the observations into two sets. One set, 
OBS a , is to be explained by the assumptions, while the 
other set, OBS c , must be consistent with the assump- 
tions. They then define two sets; 

T+ = OBS a . 

= { — 'f(x) | f(y)e OBS c , x=£y} 

A diagnosis is then a set of abducibles which, when 
added to the theory, allows prediction of all observa- 
tions in 'F f , and is consistent with the negative literals in 
V P~. 

Our definition is more powerful in several ways. 

• It extends the definition of Console and Toras- 
so from hom-clause theories to general logic 
programs with integrity constraints. This gives 
a sophisticated and expressive language for 
modelling, which includes negation as failure. 

• The inclusion of the consistency-based obser- 
vations in the object level, rather than their ne- 
gations in the integrity constraints, means that 
these can be used easily during inference. This 
can reduce the time to find a conflict, by using 
‘backwards simulation’ of components. In 
some cases, such as the example documented in 
[van Soest et al. 1990] , certain diagnoses can- 
not be found without access to the observations 
in this way. 

• Within this framework, it is possible to define 
minimal diagnoses model-theoretically. We 
will expand on this in section 10. 

Placing the consistency-based observations at the object 
level potentially gives us more efficient inference. 
However, to do this in the context of joint diagnoses can 
lead to problems. 

It may be possible to conclude that an abductive obser- 
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vation is true, based on the adding of a consistency- 
based observation to the theory alone; 

SD: obsl -»obs2 
OBS a : obs2 
OBS c : obsl 

By adding obsl to the system description, we can con- 
clude that obs2 is true. Whether this is legitimate 
depends on how we interpret the consistency-based 
observations. If we consider them true, but not neces- 
sarily explainable, then this is legitimate. This is the 
case in Reiter’s formalisation of diagnosis, and also in 
the case of the setting factors of Reggia et al. [1983]. 
However, if we consider them not necessarily true, 
merely not false, then this is unacceptable. In such cir- 
cumstances, it is necessary to restrict the model so that 
consistency-based observations do not appear in the 
body of clauses, or use the approach proposed by Con- 
sole and Torasso. 

10 Minimality 

We now focus attention on component-based diagnosis, 
and consider the problem of minimal diagnoses. We 
wish to restrict our attention to those diagnoses which 
contain a minimal number of failing components. 

To do this, we introduce minimal generalised stable 
models; 

Definition: 

A general stable model, M(A), for an abductive frame- 
work, <P,A,IC>, is minimal if there is no other GSM, 
M(a’), such that A’c A. 

Hence, a minimal general stable model contains a mini- 
mal set of assumptions which allow the consequences of 
the logic program P to satisfy the integrity constraints, 
IC. Note that, because abductive frameworks are non- 
monotonic, this does not imply that any superset of A, <D, 
will have a GSM, M(O). 

If, in our diagnosis framework, we have a 1-1 corre- 
spondence between a hypothesised failed component 
and an abducible being assumed in the abductive frame- 
work, then minimal general stable models will corre- 
spond to minimal diagnoses. To do this, we must impose 
two restrictions on the relationship between the frame- 
works; 

(i) There must be no abducible representing the correct 
behaviour of a component. This must instead be a de- 
fault behaviour which is used in the absence of abduci- 
bles referring to the faulty behaviour of a component. 

(ii) It must be illegal to make more than one assumption 
about a component’s behaviour at a time. 

Note that the second condition does not force fault 
modes to be mutually exclusive in real-life, merely that 



they must be mutually exclusive logically. This can eas- 
ily be achieved by adding an integrity constraint forbid- 
ding a component to have two modes; 

false <- ab(Cj,mji), ab(Ci,mj 2 ), mji*mj 2 . 

The framework provided by Console and Torasso satis- 
fies the second of these conditions, but not the first. Be- 
cause they work in a monotonic framework, it is not pos- 
sible to represent the correct behaviour of a component 
as the default behaviour, instead, it must be explicitly as- 
sumed that a component behaves correctly. 

As a result of this, they must specify a semantic minimi- 
sation criterion; a diagnosis is minimal if it contains a 
minimal set of abducibles corresponding to faulty be- 
haviour. We, however, can specify a model theoretic cri- 
terion; 

A diagnosis. A, is minimal if its corresponding GSM, 
M(A), is a minimal GSM. 

11 Calculating Diagnoses 

By providing a uniform model-theoretic framework for 
consistency-based, abductive and joint diagnoses, we 
have also provided a method for a uniform implementa- 
tion. We simply need an algorithm for generating the 
minimal generalised stable models of an abductive 
framework, and we can use this for performing a variety 
of diagnosis tasks. 

Much work has been carried out on the generation of 
stable models, and several efficient algorithms exist. 
However, as general stable models are a newer innova- 
tion, these results have yet to be fully exploited and 
extended to the GSM case. Currently, the state of the art 
in GSM generation is provided by Satoh and Iwayama 
[1991]. This work, however, has the drawback that it 
does not produce minimal GSMs. 

Traditionally, in the abductive community, top-down 
algorithms have been used which tend to generate mini- 
mal solutions, as they avoid making irrelevant assump- 
tions. (e.g. [Cox and Pietrzykowski 1986] [Kakas and 
Mancarella 1990b]) However, non-minimal abductive 
diagnoses are still acceptable in the model-theoretic 
semantics, and can be generated by the algorithms. 
Similarly, in the diagnosis community, generation of 
minimal diagnoses has tended to be a consequence of 
the algorithm selected (e.g. the ATMS in [deKleer and 
Williams 1987]) rather than a model-theoretic restric- 
tion. 

However, Eshghi [1990] proposes an alternative 
approach. He generates a theory in which minimal diag- 
noses correspond exactly to the stable models of the 
theory. This means that non-minimal diagnoses are 
excluded by the semantics, rather than the algorithm. 
By extending these results beyond the almost-hom case, 
we are able to transform an abductive framework into a 
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logic program. The stable models of this logic program 
correspond exactly to the minimal generalised stable 
models of the abductive framework. This means that 
minimality is brought into the theory as a necessary 
property of each solution, rather than being a selection 
criterion between solutions. This work is currently in 
progress. 

As a result of this, a wider variety of literature can be 
used to select appropriate and efficient algorithms, 
rather than being restricted to algorithms which have 
been developed specifically for the task of diagnosis. 

12 Conclusions 

By moving to a nonmonotonic logical framework, it is 
possible to bring abductive and consistency-based diag- 
nosis together, and use the same inference method to 
perform both. We have done this by using generalised 
stable models to provide the semantics, which provides 
us with a rich and expressive modelling language. It 
also gives a link between diagnosis and logic program- 
ming, allowing application of theoretical and practical 
logic programming results to the domain of diagnosis. 
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Abstract 

A forward- chaining hypothetical reasoner with the 
assumption-based truth maintenance system (ATMS) 
has some advantages such as avoiding repeated proofs. 
However, it may prove subgoals unrelated to proofs of 
the given goal. To simulate top-down reasoning on 
bottom-up reasoners, we can apply the upside-down 
meta-interpretation method to hypothetical reasoning. 
Unfortunately, when programs include negative clauses, 
it does not achieve speedups because checking the consis- 
tency of solutions by negative clauses should be globally 
evaluated. This paper describes a new transformation 
algorithm of programs for efficient forward- chaining hy- 
pothetical reasoning. In the transformation algorithm, 
logical dependencies between a goal and negative clauses 
are analyzed to find irrelevant negative clauses, so that 
the forward-chaining hypothetical reasoners based on the 
upside-down meta-interpretation can restrict consistency 
checking of negative clauses to those relevant clauses. 
The transformed program has been evaluated with a 
logic circuit design problem. 

1 Introduction 

Hypothetical reasoning [Inoue 88] is a technique for prov- 
ing the given goal from axioms together with a set of hy- 
potheses that do not contradict with the axioms. Hypo- 
thetical reasoning is related to abductive reasoning and 
default reasoning. 

A forward- chaining hypothetical reasoner can be con- 
structed by simply combining a bottom-up reasoner 
with the assumption-based truth maintenance system 
(ATMS) [de Kleer 86-1] (for example [Flann et ah 87, 
Junker 88]). We have implemented a forward-chaining 
hypothetical reasoner [Ohta and Inoue 90], called APRI- 
COT/O, which consists of the RETE-based inference 
engine [Forgy 82] and the ATMS. With this architec- 
ture, we can reduce the total cost of the label compu- 
tations of the ATMS by giving intermediate justifica- 
tions to the ATMS at two- input nodes in the RETE- 
like networks. On the other hand, hypothetical rea- 



soning based on top-down reasoning has been proposed 
in [Poole et al. 87, Poole 91]. Compared with top-down 
(backward-chaining) hypothetical reasoning, bottom-up 
(forward- chaining) hypothetical reasoning has the ad- 
vantage of avoiding duplicate proofs of repeated subgoals 
and duplicate proofs among different contexts. Bottom- 
up reasoning, however, has the disadvantage of proving 
unnecessary subgoals that are unrelated to the proofs of 
the goal. 

To avoid the disadvantage of bottom-up reasoning, 
Magic Set method [Bancilhon et ah 86] and Alexander 
method [Rohmer et ah 86] have been proposed for de- 
ductive database systems. Recently, it is shown that 
Magic Set and Alexander methods are interpreted as 
specializations of the upside-down meta-interpretation 
[Bry 90]. The upside-down meta-interpretation has been 
extended to abduction and deduction with non-Horn 
clauses in [Stickel 91]. His abduction, however, does not 
require the consistency of solutions. 

Since the consistency requirement is crucial for some 
applications, we would like to make programs include 
negative clauses for our hypothetical reasoning. When 
programs include negative clauses, however, the upside- 
down meta-interpretation method does not achieve 
speedups because checking the consistency of solutions 
by negative clauses should be globally evaluated. 

We present a new transformation algorithm of pro- 
grams for efficient forward- chaining hypothetical reason- 
ing based on the upside-down meta-interpretation. In 
the transformation algorithm, logical dependencies be- 
tween a goal and negative clauses are analyzed to find 
irrelevant negative clauses, so that the forward-chaining 
hypothetical reasoners based on the upside-down meta- 
interpretation can restrict consistency checking of nega- 
tive clauses to those relevant clauses. The transformed 
program has been evaluated with a logic circuit design 
problem. 

In Section 2, our hypothetical reasoning is defined with 
the default proofs [Reiter 80]. In Section 3, the outline 
of the ATMS is sketched. Section 4 shows the basic algo- 
rithm for hypothetical reasoning based on the boftom-up 
reasoner MGTP [Fujita and Hasegawa 91] together with 
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the ATMS. Section 5 presents two transformation algo- 
rithms based on the upside-down meta-interpretation. 
One is a simple transformation algorithm, the other is 
the transformation algorithm with the abstracted depen- 
dency analysis. We have implemented the hypothetical 
reasoner and these program transformation systems, and 
Section 6 shows the result of an experiment for the evalu- 
ation of the transformed programs. In Section 7, related 
works are considered. 

2 Problem Definition 

In this section, we define our hypothetical reasoning 
based on a subset of normal default theories [Reiter 80]. 
A normal default theory ( D , W) and a goal G are given 
as follows: 

• W: a set of Horn clauses. 

A Horn clause is represented in an implicational 



form, 


ctx A • 


-A a n -> (3 


(1) 


or 


ax A • • 


• A a n -> 1. 


(2) 



Here, on (1 < i < n\ n > 0) and /? are atomic 
formulas, and J_ designates falsity. Function sym- 
bols are restricted to 0-ary function symbols. All 
variables in a clause are assumed to be universally 
quantified in front of the clause. Each Horn clause 
has to be range-restricted , that is, all variables in 
the consequent (3 have to appear in the antecedent 
oc\ A * • • A ci n . A Horn clause of the form (2) is called 
a negative clause. 

• D: a set of normal defaults. 

A normal default is an inference rule, 



where a , called the prerequisite of the normal de- 
fault, is restricted to a conjunction ax A • • • A a n of 
atomic formulas and (3, called its consequent , is re- 
stricted to an atomic formula. Function symbols are 
restricted to 0-ary function symbols. All variables in 
the consequent (3 have to appear in the prerequisite 
a. A normal default with free variables is identified 
with the set of its ground instances. The normal 
default can be read as “ if a and it is consistent to 
assume (3, then infer (3 n . 

• goal G: a conjunction of atomic formulas. 

All variables in G are assumed to be existentially 
quantified. 



Let A be the set of all ground instances of the normal 
defaults of D. A default proof [Reiter 80] of G with re- 
spect to ( D , W) is a sequence Ao, • • • , A*, of subsets of 
A if and only if 

1. WU CONSEQUENTS (A 0 ) h G, 

2. for 1 < i < k, 

WU CONSEQUENTS (Ai) h 

PREREQUISITES(A i _ 1 ) , 

3. A*, = 0, 

4. W U ULo CONSEQ UENTS(Ai) is consistent, 
where 

PREREQUISITES (Ai.x) = f\ct 

for (a : /?//?) € A;_! and 

CONSEQUENTS (Ai) = {(3 \ (a : (3/13) € A,-}. 

A ground instance GO of the goal G is an answer to G 
from (D, W) if 

k 

W U U CONSEQUENTS (Ai) |= GO, 

i = 0 

where the sequence A 0 , • • • , A k is a default proof of 
G with respect to ( D , W). If GO is an answer to 
G from (D, W), 0 is an answer substitution for G 
from (D, W). A support for an answer GO from 
(D,W) is (JLo CONSEQUENTS (A { ), where the se- 
quence Ao, • • • , Ak is a default proof of GO with respect 
to (D,W). For an answer GO from (D,W), the mini- 
mal supports for GO from (D,W), written as MS(GO), 
is the set of minimal elements in all supports for GO from 
(D, W). The solution to G from (D, W) is the set of all 
pairs ( G0,MS(G0 )), where GO is an answer to G from 
(D,W) and MS(G0) is the minimal supports for GO. 
The task of our hypothetical reasoning is defined to find 
the solution to a given goal from a given normal default 
theory. 

3 ATMS 

The ATMS [de Kleer 86-1] is used as one component of 
our hypothetical reasoner. The following is the outline 
of the ATMS. 

In the ATMS, a ground atomic formula is called a da- 
tum. For some datum N, Tjv designates an assumption. 
The ATMS treats both _L and Tn as special data. The 
ATMS represents each datum as an ATMS node: 

(datum, label, justifications) . 

Justifications correspond to ground Horn clauses and are 
incrementally input to the ATMS. Each justification is 
denoted by: 

Nx,---,N n =>N, 
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where Nj and N are data. Each datum Nj is called an 
antecedent, and the datum N is called a consequent. In 
the slot justifications, the ATMS records the set of an- 
tecedents of justifications whose consequents correspond 
to the datum. 

Let if be a current set of assumptions. An assumption 
set E C H is called an environment. When we denote 
an environment by a set of assumptions, each assumption 
Tiv is written as N by omitting the letter T. Let J be a 
current set of justifications. An environment E is called 
nogood if JuE derives J_. The label of the datum N is the 
set of environments {E\, • • • , Ej, • • • , E m } that satisfies 
the following four properties [de Kleer 86-1]: 

1. N holds in each Ej (soundness), 

2. every environment in which N holds is a superset of 
some Ej (completeness), 

3. each Ej is not nogood (consistency), 

4. no Ej is a subset of any other (minimality). 

If the label of a datum is not empty, the datum is be- 
lieved; otherwise it is not believed. A basic algorithm 
to compute labels [de Kleer 86-1] is as follows. When 
a justification is incrementally input to the ATMS, the 
ATMS updates the labels relevant to the justification in 
the following procedure. 

Step 1: Let L be the current label of the consequent 
N of the justification and Lj be the current label 
of the i-th antecedent Nj of the justification. Set 
L' = L U { x \ x = UF =1 Ei, where E , Lj}. 

Step 2: Let L" be the set obtained by removing no- 
goods and subsumed environments from L' . Set the 
new label of N to L" . 

Step 3: Finish this updating if L is equal to the new 
label. 

Step 4: If N is _L, then remove all new nogoods from 
labels of all data other than _L. 

Step 5: Update labels of the consequents of the 

recorded justifications which contain N as their an- 
tecedents. 



4 Hypothetical Reasoner with 
ATMS and MGTP 

The MGTP [Fujita and Hasegawa 91] is a model gener- 
ation theorem prover for checking the unsatisfiability of 
a first-order theory P. Each clause in P is denoted by: 



<*i A • • • A a n — » ft V • • • V /3 m , 



where £*,(1 < i < n;n > 0) and f3j( 1 < j < m;m > 0) 
are atomic formulas and all variables in fli V • • • V fd m 
have to appear in oq A • • • A a n . Each clause in P is 
translated into a KL1 [Ueda and Chikayama 90] clause. 
Then, model candidates are generated from the set of 
KL1 clauses. The MGTP works as a bottom-up reasoner 
on the distributed-memory multiprocessor called Multi- 
PSI. 

As shown in Figure 1, we can construct a hypotheti- 
cal reasoner by combining the MGTP with the ATMS. 
The normal default theory ( D , W) is translated into a 
program P , 

P = { c*! A • • • A a n — ► assume(^) | 

(£*1 A • • • A a n : f3//3) e D }UW, 

where assume is a metapredicate not appearing any- 
where in D and W. 



Inference Engine 


Justifications 






ATMS 


MGTP 


Beliefs 





Figure 1: Forward-Chaining Hypothetical Rea- 

soner with ATMS and MGTP 



procedure R(G,P ) : 
begin 
B 0 := 0; 

:= { (=> 0) I H P) € P } 

U { (T/j => /3) | (-> assume(/?)) € P }; 
s := 0; 

while J s ^ 0 do 
begin 

s := s + 1; 

B s := UpdateLabels(J a _i,ATMS ); 

J s := GenerateJustifications(B 3 ,P,B a _i ) 

end; 

Solution := 0; 

for each 9 such that GO € B„ do 
begin 

L ae := G et Label (G 9, ATMS); 

Solution := Solution U {(GO, Lgs)} 

end; 

return Solution 
end. 

Figure 2: Reasoning Algorithm with ATMS and 
MGTP 



The reasoning procedure R(G, P) for the MGTP with 
the ATMS is shown in Figure 2. The reasoning proce- 
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dure consists of the part for UpdateLabels - Generate- 
Justifi cations cycles and the part for constructing the 
solution. The UpdateLabels - GenerateJustifications cy- 
cles are repeated while J s is not empty. The ATMS 
updates the labels related to a justification set J s _i 
given by the MGTP. The ATMS returns the set B s 
of all the data whose labels are not empty after the 
ATMS has updated labels with J s _i. The procedure 
UpdateLabels(J s -i,ATMS ) returns a believed data set 
B s . The MGTP generates each set J s of justifications 
by matching elements of B s with the antecedent of ev- 
ery clause related to new believed data. The procedure 
GenerateJustifications(B a ,P,B s -i) returns a new jus- 
tification set J s . If any element in (B s \ iA-i) can match 
an element of the antecedent of any (ai A • • • A a n X) 
in P and there exists a ground substitution a for all a,- 
such that a % a £ B s , then J s is as follows. 

• • • • , a n ( 7, Tpo- => fda) £ J s if X = assume(/?). 

• (qi<j, • • • , a n a => fda) £ J s if X = fd. 

• (aqer, • • • , a n a => -L) G J s if X - _L. 

The procedure GetLabel(GO, ATM S) returns the label 
of GO and is used in constructing the solution. Note 
that the label of GO corresponds to the minimal sup- 
ports for GO. The hypothetical reasoner with the ATMS 
and the MGTP can avoid duplicate proofs among differ- 
ent contexts and repeated proofs of subgoals. However, 
there may be a lot of unnecessary proofs unrelated to the 
proofs of the goal. 



5 Upside-Down 

Meta-Interpretation 

5.1 Simple Transformation Algorithm 

Bottom-up reasoning has the disadvantage of proving 
unnecessarily subgoals that are not related to proofs of 
the given goal. We introduce a simple transformation 
of a program P on the basis of the upside-down meta- 
interpretation for speedups of bottom-up reasoning by 
incorporating goal information. A bottom-up reasoner 
interprets a Horn clause 



ai A • • • A a n — » jd 

in such a way that the fact fda is derived if facts 
oil <7, • • • , a n a are present for some substitution a. On 
the other hand, a top-down reasoner interprets it in such 
a way that goals oqcr, • • • , a n a are derived if a goal Ida 
is present, and fact fda is derived if both a goal /da and 
facts cticr, • • • , a n a are present. We transform the Horn 
clause 

ai A • • • A a n — >■ fd 



into 



goal(fd) — > goal(ai) 
for every a t (l < i < n) and 

goal(fd) A oq A • • • A a n — > fd, 



then a bottom-up reasoner can simulate top-down rea- 
soning. Here, goal is a metapredicate symbol which does 
not appear in the original program P. After some facts 
related to the proofs of the goal have derived with the 
upside-down meta-interpretation, those facts may derive 
contradiction with bottom-up interpretation of the orig- 
inal program. Thus, we transform each negative clause 



oq A • • • A a n — ► _L 



into 

oq A • • • A a n — > i 

and 

-» goal(ai) 

for every a,- (1 < z < n). This means that every subgoal 
related to negative clauses is evaluated. 

Note that ( goal(/d ) — ► goal(ai)) or (— > goal(ai )) may 
not be satisfy the range-restricted condition. We have 
some techniques which make every clause in transformed 
programs range-restricted. Here, we take a very simple 
technique in which only the predicate symbols are used 
as the arguments of the metapredicate goal. When 7 is 
an atomic formula, we denote by 7 the predicate symbol 
of 7. The algorithm T1 as shown in Figure 3 transforms 
an original program P into the program P in which the 
top-down information is incorporated. The solution to 
G from T1(G, P ) is always the same as the solution to G 
from P because all subgoals related to negative clauses 
as well as the given goal are evaluated and every label of 
goal(fd) for any atomic formula /? is {0}. 

For example, consider a program, 

P b = { — * penguin(a), 

penguin(X) — > bird(X ), 
bird(X) — * assum e(fly(X)), 
fly(X) /\ notfly(X) — > _L, 
penguin(X) — > notfly(X) }. 

By the simple transformation algorithm, we get 

Tl(fly, P b ) = 

{ goal(penguin ) — > penguin(a ), 

goal(bird ) A penguin(X) — » bird(X ), 
goal{bird ) — > goal(penguin ) , 
goal(fly) A bird(X ) — > assum e(fly(X)), 
qoal(fly) — > qoal(bird), 
fly(X) Anotfly(X) — * _L, 

9oal(fly), 

-> goal(notfly ), 

goal(notfly) A penguin(X) — > notfly(X), 
goal(notfly) — >■ goal(penguin) } 

U { -► goal(fly) }. 
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Next, consider the goal bird(X). Then, the transformed 
program Tl(bird, P b ) is the program 

Tl(bird,P b ) = {•••} U { — * goalfibird ) }, 

where only the last element (— » goal(fly)) of Tl(fly , P b ) 
is replaced with (— ► goal(bird )). Even if the goal 

is bird(X), both goal(fly) and goal(notfly ) are eval- 
uated because {•••} includes (— > goal(fly)) and (— > 
goal(notfly )) for the negative clause. Then, the compu- 
tational cost of R(bird(X),Tl(bird, P b )) is nearly equal 
to the cost of R(fly{X),Tl(fly,Pb)). 



procedure T1(G,P): 
begin 

P := 0; 

for each (oq A • • • A a n — > X) G P do 

begin 

if X = J_ then 
begin 

P := P U {ai A • • • A a n — > J_}; 
for j := 1 until n do 
P := P U {— ■> goal(dj)} 

end 

else if X = assume(/3) then 
begin 

P := P U {goal(jd) A cti A • • • A a n — »• assume(/?)}; 
for j 1 until n do 
P:=PU {goal(jd) — > goa/(dij)} 

end 

else if X = (d then 
begin 

P := P U {goa/(/3) A aq A • • • A a n — > jd}\ 
for j 1 until n do 
P := P U {goal(jd) —* goal(dj)} 

end 

end; 

P:=PUH flfoa/(<2)}; 
return P 
end. 

Figure 3: Simple Transformation Algorithm T1 



5.2 Transformation Algorithm with 
Abstracted Dependency Analysis 

In this subsection, we describe a static method to find 
irrelevant negative clauses to evaluation of the goal. If 
we can find such irrelevant negative clauses, for every 
antecedent a; of each irrelevant clause, we do not need to 
add (— > goal(ai)) into the transformed program. We try 
to find them by analyzing logical dependencies between 



the goal and each negative clause at the abstracted level. 
We do not care about any argument in the abstracted 
dependency analysis. 

When 7 is an atomic formula, we denote by the propo- 
sition 7 the predicate symbol of 7 . For each negative 
clause (7, the proposition f alse(C) is used as the iden- 
tifier of C. For every (a — ► assume(/?)), is called an 
assumable-predicate symbol. For any environment E, its 
abstracted environment (denoted by E) is { T^ | T^ G E}. 
The abstracted justifications with respect to P is defined 
as: 

J = {(ax, - ■■,d n ,Tp =» p) | 

(«x A • • • A a n — > assume(/?)) G P } 

U {(ax, - • • ,a n =*• P) | (ax A • • • A a n -> fd) G P} 

U {(ax, • • • , a n =>• false(C )) \ 

C = (ax A • • • A a n — + _L), C G Pi- 

Let A be the set of propositions appearing in J . Note 
that A consists of all predicate symbols in P and all 
false(C) for C G P. For each proposition N in A, we 
compute a set of abstracted environments on which N 
depends. Now, we show an algorithm to compute the 
set of abstracted environments. This algorithm is ob- 
tained by modifying the label-updating algorithm shown 
in Section 3. The modified points are as follows. 

1. Replace Step 2 with 

Step 2': Set the new label of N to L' . 

2. Remove Step 4. 

Every proposition in A is labeled with the set of ab- 
stracted environments obtained by applying the modi- 
fied algorithm to the abstracted justifications J. This 
label is called the abstracted label of the proposition. 
The system to compute the set of abstracted environ- 
ments for each proposition is called an abstracted depen- 
dency analyzer. The reasons why we have to modify the 
label-updating algorithm are as follows. Firstly, in the 
abstracted justifications, every _L is replaced with the 
proposition false{C) for the negative clause C, so that 
each abstracted label is always consistent. Thus, we do 
not need Step 4. Secondly, each abstracted label may 
not be minimal because we replace Step 2 with Step 2'. 
Suppose that every abstracted label is minimal. Then, 
the theorem we present below may not hold. For exam- 
ple, let 

P e = { ->p( G ), -p( 6 ), ->q(b), q(X)->t(X), 
p(X) — > assume(r(X)), 
p(X) — y assume(s(A)), 
r(a)->g, r(X) A s(X) — ► g, 
r(X) A s(X) A t(X) 

Consider the problem defined with the goal g and P e . 
The abstracted label of g is {{r}, {r, s}} . The abstracted 
label of the negative clause is {{r, s}}. The abstracted 
environment {r, s} cannot be omitted for g although the 
set of minimal elements in the abstracted label of g is 

{Ml- 
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procedure T2(G,P) : 
begin 

P := 0; 

J := 0 ; 

k := 0; 

for each (aq A • • • A a„ - do 

begin 

if = 1 then 
begin 
& := & 4- 1 

.P := P U {oq A • • • A ct n — + 2-} 5 
J := J U {(di, • • • , d n => /a/se(&))}; 

end 

else if X = assume(/3) then 
begin 

P := PU 

{goal(/3) A oq A • • • A a n — » assume(/?)}; 

J := J U {(dq, • • • , d n , =4> /?)}; 
for j := 1 until n do 
P := P U { goal(/3 ) — > goaZ(dj)} 

end 

else if X = (d then 
begin 

P P U {goal([3) A oq A • • • A a n — > /?}; 

J := J U {(di, • • • , d„ =4> /?)}; 
for j := 1 until n do 
P:=PU {<7oa/(/?) — » ^oa/(dj)} 

end 

end; 

U pdateAbstractedLabels(J , ADA)] 

Lq ■= GetAbstractedLabel(G,ADA ); 

for i := 1 until A: do 
begin 

Pi := GetAbstractedLabel(false(i),ADA)\ 
for each Eq G Pg do 
for each Ei € Pi do 
if E{ C Pg then 
for (dq, • • • ,d n =4- false(i)) € J do 
for j 1 until n do 
P := PU {— »• 5'oaZ(dj)} 

end; 

P:=P U{-^ ^oaZ(G)}; 
return P 
end. 

Figure 4: Transformation Algorithm T2 with Ab- 
stracted Dependency Analysis 



Theorem: Let P be a normal default theory and G 
a goal, J the abstracted justifications with respect to 
P , L(G) the abstracted label of G , L(false(C )) the 
abstracted label of false(C) where C € P. If no element 
in L(false(C )) is a subset of any element in L(G), then 
the solution to G from P is equivalent to the solution to 
G from P \ {C}. 

Sketch of the proof: Let C be (o — ■> _L) and P' 
be P \ {C}. Assume that 8 m is any answer substitution 
for G from P' and cr*. is any answer substitution for a 
from P' . Let MS(a<Jk ) be the minimal supports for ctcr k 
from P' and MS(G9 m ) be the minimal supports for G9 m 
from P' . Suppose that no element in L(false(C)) is a 
subset of any element in L(G). From the supposition and 
similarity between ATMS labels and abstracted labels, 
no element in MS(aa- k ) is a subset of any element in 
MS(G6 m ). Therefore, the solution to G from P' U { C } 
is the same as the solution to G from P' . ■ 

On the basis of the theorem, we can omit consis- 
tency checking for a negative clause C if the condition 
of the theorem is satisfied. The transformation algo- 
rithm T2(G, P) with the abstracted dependency analysis 
is shown in Figure 4 for the program P and the goal G. 
In Figure 4, Update AbstractedLabels(J, ADA) denotes 
the procedure which computes abstracted labels from ab- 
stracted justifications J with the abstracted dependency 
analyzer ADA, and GetAbstractedLabel(G,ADA) de- 
notes the procedure which returns the abstracted label of 
G from the abstracted dependency analyzer ADA. The 
procedure transforms an original program into the pro- 
gram in which the top-down information is incorporated 
and consistency checking is restricted to those negative 
clauses relevant to the given goal. 

Consider the same example P, shown in the previ- 
ous subsection, in case that the goal is bird(X). The 
abstracted justifications Jf, is 

{ (=y- penguin), (_ penguin =$> bird), ( bird,Y ji y =4* fly), 
(fly, not fly =4* false( 1)), (penguin =4> not fly) }. 

As the result of the abstracted dependency analysis, 
the abstracted label of false( 1) is {{fly}} and the ab- 
stracted label of bird is {0}. Then, no element in the 
abstracted label of false( 1) is a subset of any element in 
the abstracted label of bird, so that we do not need to 
evaluate this negative clause. As a consequence, we have 
the transformed program: 

T2(bird, P b ) = 

{ goal(penguin) -4 penguin(a), 

goal(bird) A penguin(X) -4 bird(X), 

goal(bird) —*■ goal(penguin) , 

goal(fly) A bird(X) -4 assnme(fly(X)), 

qoal(fly) —» qoaUbird), 

fly(X) A notfly(X) — » J_, 

goal(notfly) A penguin(X) -4 notfly(X), 

goal(notfly) —> goal(penguin) } 

U { — > goal(bird ) }. 
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Since the transformed program does not include (— » 
goal(fly)) and (— ► goal (not fly)), the reasoner can omit 
solving both the goal fly(X) and the goal notfly(X). 

6 Evaluation with Logic Design 
Problem 

We have taken up the design of logic circuits to calcu- 
late the greatest common divisor (GCD) of two integers 
expressed in 8 bits by using the Euclidean algorithm. 
The solutions are circuits calculating GCD and satisfying 
given constraints on area and time [Maruyama et al. 88]. 
The program Pd contains several kinds of knowledge: 
datapath design, component design, technology map- 
ping, CMOS standard cells and constraints on area and 
time [Ohta and Inoue 90]. The design problem of calcu- 
lators for GCD includes design of components such as 
subtracters and adders. 

Table 1 shows the expermental result, on a Pseudo- 
Multi-PSI system, for the evaluation of the transformed 
programs. The run time of a program P for a goal G 
is denoted by Tr(g,f)- The predicate symbol G of each 
goal G is adder (design of adders), subtracter (design of 
subtracters) or cGCD (design of calculators for GCD). 
The run time TR^a,p d ) of each goal G is equal to the others 
on the original program Pd- 



Table 1: Run Time of Program 



Goal G 


TR(G,P d ) [s] 


Tr(g,pa [s] 


Tr(g,p 2 ) [s] 


adder 


10.7 


17.5 


0.4 


subtracter 


10.7 


17.3 


0.6 


cGCD 


10.7 


17.3 


16.8 



Let Pi be the simple transformed program of P^. The 
experiment on the simple transformation time shows that 
it takes 6.35 [s] for making Pi from Pd- However, the run 
time Pr(g,P!) for each goal G is nearly equal to the oth- 
ers because constraints on area and time of the GCD 
calculators are represented by negative clauses. Even if 
we want to design adders or subtracters, the hypotheti- 
cal reasoner cannot avoid designing GCD calculators for 
consistency checking. 

Let P 2 be the transformed program with the ab- 
stracted dependency analysis. The experiment on the 
transformation time with the abstracted dependency 
analysis shows that it takes 6.63 [s] for making P 2 from 
Pd- The transformation time with the abstracted de- 
pendency analysis is a little bit longer (0.28 [s]) than 
the simple transformation time. When G is adder or 
subtracter , the run time Pr(g,p 2 ) is much shorter than 
the run time for the design of GCD calculators. This is 
because the program can avoid consistency checks for 
negative clauses representing constraints on area and 



time of the GCD calculators when the design of adders 
or the design of subtracters is given as a goal. The re- 
sult show that each total of the transformation time with 
abstracted dependency analysis and the run time of the 
transformed program is shorter than the run time of the 
original program when the problem does not need the 
whole of the program. 

7 Related Work 

The algorithm for first-order Horn- clause abduction with 
the ATMS is presented in [Ng and Mooney 91]. The sys- 
tem is basically a consumer architecture [de Kleer 86-3] 
introducing backward-chaining consumers. The algo- 
rithm avoids both redundant proofs by introducing the 
goal- directed backward- chaining consumers and dupli- 
cate proofs among different contexts by using the ATMS. 
Their problem definition is the same as [Stickel 90], 
whose inputs are a goal and a set of Horn clauses without 
negative clauses. When there are negative clauses in the 
program, they briefly suggest that forward- chaining con- 
sumer can be used for each negative clause to check the 
consistency. On the other hand, since we only simulate 
backward-chaining by the forward- chaining reasoner, we 
do not require both types of chaining rules. Moreover, 
we see that when the program includes negative clauses, 
it is sometimes difficult to represent the clauses as a set 
of consumers. For example, suppose that the axioms are 

{a — ► c, b — > d, c A d — > g, c — > e, d — > /, e A / — * 1} 

and the goal is g. Assume that the set of consumers is 

{(c 4= a), (d 4= b ), (g <*= c, d), 

(e 4= c), (/ 4= d), (e,f J_)}, 

where 4= means a backward- chaining consumer and 
=>■ means a forward- chaining consumer. Then, we 
get the solution {(g, {{#}, [a, b}, {a, d}, {c, b}, {c, d}})}. 
However, the correct solution is {(g, {{< 7 }})} because 
{a,b},{a,d},{c,b} and {c, d} are nogood. To guaran- 
tee the consistency when the program includes negative 
clauses, for every Horn clause, we have to add the corre- 
sponding forward-chaining consumer. Such added con- 
sumers would cause the same problem as the program 
that appeared in using the simple transformation algo- 
rithm. 

In [Stickel 91], deduction and abduction with the 
upside-down meta-interpretation are proposed. This ab- 
duction does not require the consistency of solutions. 
Furthermore, rules may do duplicate firing in different 
contexts since it does not use the ATMS. This often 
causes a problem when it is applied to practical programs 
where heavy procedures are attached to rules. 

Another difference between the frameworks of 
[Ng and Mooney 91, Stickel 91] and ours is that their 
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frameworks treat only hypotheses in the form of nor- 
mal defaults without prerequisites, whereas we allow for 
normal defaults with prerequisites. 

8 Conclusion 

We have presented a new transformation algorithm of 
programs for efficient forward-chaining hypothetical rea- 
soning based on the upside-down meta-interpretation. In 
the transformation algorithm, logical dependencies be- 
tween a goal and negative clauses are analyzed at ab- 
stracted level to find irrelevant negative clauses, so that 
consistency checking of negative clauses can be restricted 
to those relevant clauses. It has been evaluated with a 
logic circuit design problem on a Pseudo-Multi-PSI sys- 
tem. 

We can also apply this abstracted dependency anal- 
ysis to transformed programs based on Magic Set and 
Alexander methods. Our dependency analysis with only 
predicate symbols may be extended to an analysis with 
predicate symbols and their some arguments. 
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Abstract 

Probabilistic Horn abduction is a simple frame- 
work to combine probabilistic and logical rea- 
soning into a coherent practical framework. 

The numbers can be consistently interpreted 
probabilistically, and all of the rules can be in- 
terpreted logically. The relationship between 
probabilistic Horn abduction and logic pro- 
gramming is at two levels. At the first level 
probabilistic Horn abduction is an extension of 
pure Prolog, that is useful for diagnosis and 
other evidential reasoning tasks. At another 
level, current logic programming implementa- 
tion techniques can be used to efficiently imple- 
ment probabilistic Horn abduction. This forms 
the basis of an “anytime” algorithm for esti- 
mating arbitrary conditional probabilities. The 
focus of this paper is on the implementation. 

1 Introduction 

Probabilistic Horn Abduction [Poole, 1991c; Poole, 
1991b; Poole, 1992a] is a framework for logic-based ab- 
duction that incorporates probabilities with assump- 
tions. It is being used as a framework for diagnosis 
[Poole, 1991c] that incorporates both pure Prolog and 
Bayesian Networks [Pearl, 1988] as special cases [Poole, 
1991b]. This paper is about the relationship of proba- 
bilistic Horn abduction to logic programming. This sim- 
ple extension to logic programming provides a wealth of 
new applications in diagnosis, recognition and evidential 
reasoning [Poole, 1992a]. 

This paper also presents a logic-programming solution 
to the problem in abduction of searching for the “best” 
diagnoses first. The main features of the approach are: 

• We are using Horn clause abduction. The proce- 
dures are simple, both conceptually and computa- 
tionally (for a certain class of problems). We de- 
velop a simple extension of SLD resolution to im- 
plement our framework. 

• The search algorithms form “anytime” algorithms 
that can give an estimate of the conditional proba- 
bility at any time. We do not generate the unlikely 
explanations unless we need to. We have a bound on 



the probability mass of the remaining explanations 
which allows us to know the error in our estimates. 

• A theory of “partial explanations” is developed. 
These are partial proofs that can be stored in a pri- 
ority queue until they need to bp further expanded. 
We show how this is implemented in a Prolog inter- 
preter in Appendix A. 

2 Probabilistic Horn abduction 

The formulation of abduction used is a simplified form 
of Theorist [Poole et al., 1987; Poole, 1988] with prob- 
abilities associated with the hypotheses. It is simpli- 
fied in being restricted to definite clauses with simple 
forms of integrity constraints (similar to that in [Goebel 
et al., 1986]). This can also be seen as a generalisa- 
tion of an ATMS [Reiter and de Kleer, 1987] to be non- 
propositional. 

The language is that of pure Prolog (i.e., definite 
clauses) with special disjoint declarations that specify a 
set of disjoint hypotheses with associated probabilities. 
There are some restrictions on the forms of the rules and 
the probabilistic dependence allowed. The language pre- 
sented here is that of [Poole, 1992a] rather than that of 
[Poole, 1991c; Poole, 1991b]. 

The main design considerations were to make a lan- 
guage the simplest extension to pure Prolog that also 
included probabilities (not just numbers associated with 
rules, but numbers that follow the laws of probability, 
and so can be consistently interpreted as probabilities 
[Poole, 1992a]). We are also assuming very strong in- 
dependence assumptions; this is not intended to be a 
temporary restriction on the language that we want to 
eventually remove, but as a feature. We can repre- 
sent any probabilistic information using only indepen- 
dent hypotheses [Poole, 1992a]; if there is any depen- 
dence amongst hypotheses, we invent a new hypothesis 
to explain that dependency. 

2.1 The language 

Our language uses the Prolog conventions, and has the 
same definitions of variables, terms and atomic symbols. 

Definition 2.1 A definite clause is of the form: a. 
or a <- (i] A A a. n . where a and each «; are atomic 
symbols. 
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Definition 2.2 A disjoint declaration is of the form 

disjoint([hi :pi,---,h n : p„j). 

where the hi are atoms, and the pi are real numbers 
0 < Pi < 1 such Pi + • ■ ■ + Pn = 1- Any variable 
appearing in one hi must appear in all of the hj (i.e., the 
hi share the same variables). The hi will be referred to 
as hypotheses. 

Definition 2.3 A probabilistic Horn abduction 
theory (which will be referred to as a “theory”) is a col- 
lection of definite clauses and disjoint declarations such 
that if a ground atom h is an instance of a hypothesis 
in one disjoint declaration, then it is not an instance of 
another hypothesis in any of the disjoint declarations. 

Given theory T, we define 

Ft the facts, is the set of definite clauses in T together 
with the clauses of the form 

false <— h{ A hj 

where hi and hj both appear in the same disjoint 
declaration in T, and i j. Let Ff be the set of 
ground instances of elements of Ft- 

Ht to be the set of hypotheses, the set of hi such that 
hi appears in a disjoint declaration in T. Let Hip 
be the set of ground instances of elements of Hr- 

Pt is a function Hip [0,1]. Pr(/i[) = pi where hi is a 
ground instance of hypothesis A* , and hi : pi is in a 
disjoint declaration in T. 

Where T is understood from context, we omit the sub- 
script. 

Definition 2.4 [Poole ei al,, 1987; Poole, 1987] If g is 
a closed formula, an explanation of g from (F, H) is a 
set D of elements of H' such that 

• F U D g and 

• F U D false. 

The first condition says that D is a sufficient cause for 
g, and the second says that D is possible. 

Definition 2.5 A minimal explanation of g is an ex- 
planation of g such that no strict subset is an explanation 
of g. 

2.2 Assumptions about the rule base 

Probabilistic Horn abduction also contains some as- 
sumptions about the rule base. It can be argued that 
these assumptions are natural, and do not really restrict 
what can be represented [Poole, 1992a]. Here we list 
these assumptions, and use them in order to show how 
the algorithms work. 

The first assumption we make is about the relationship 
between hypotheses and rules: 

Assumption 2.G There are no rules with head unifying 
with a member of H . 

Instead of having a rule implying a hypothesis, we 
invent a new atom, make the hypothesis imply this atom, 
and all of the rules imply this atom, and use this atom 
instead of the hypothesis. 



Assumption 2.7 (acyclicity) If F' is the set of ground 
instances of elements of F, then it is possible to assign 
a natural number to every ground atom such that for 
every rule in F' the atoms in the body of the rule are 
strictly less than the atom in the head. 

This assumption is discussed in [Apt and Bezem, 
1990], 

Assumption 2.8 The rules in F' for a ground non- 
assumable atom are covering. 

That is, if the rules for a in F' are 

a *— Bi 
a *— B 2 

a * B m 

if a is true, one of the Bi is true. Thus Clark’s completion 
[Clark, 1978] is valid for every non-assumable. Often we 
get around this assumption by adding a rule 

a *— some-other jreason-f or _a 

and making “some-other jreason-for-a” a hypothesis 
[Poole, 1992a]. 

Lemma 2.9 [Console et al., 1991; Poole, 1988] Under 
assumptions 2.6, 2.7 and 2.8, if expl(g,T) is the set of 
minimal explanations of g from theory T: 

9 = V e * 

eidexpl(g ,T) 

Assumption 2.10 The bodies of the rules in F' for an 
atom are mutually exclusive. 

Given the above rules for a, this means that 

Bi A Bj => false 

is true in the domain under consideration for each i ^ j. 
We can make this true by adding extra conditions to the 
rules to make sure they are disjoint. 

Lemma 2.11 Under assumptions 2.6 and 2.10 , mini- 
mal explanations of atoms or conjunctions of atoms are 
mutually inconsistent. 

See [Poole, 1992a] for more justification of these as- 
sumptions. 

2.3 Probabilities 

Associated with each possible hypothesis is a prior prob- 
ability. We use this prior probability to compute arbi- 
trary probabilities. 

The following is a corollary of lemmata 2.9 and 2.11 

Lemma 2.12 Under assumptions 2.6, 2.7, 2.8, 2.10 
and 2.13, if expl(g,T) is the set of minimal explana- 
tions of conjunction of atoms g from probabilistic Horn 
abduction theory T: 

P(g) = P\ V e <) 

\ei£expl(g,T ) / 

= £ p(*i) 

e,eexpl(g,T ) 
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Thus to compute the prior probability of any g we sum 
the probabilities of the explanations of g. 

To compute arbitrary conditional probabilities, we use 
the definition of conditional probability: 



rule((h c, e)). 

rule( (h g, b)). 

disjoint ( [b : 0 . 3 , c : 0 . 7] ) . 
disjoint ([e:0.6,f:0.3,g:0.1]). 



P(a\/3) = 



P(a A /?) 

W) 



Thus to find arbitrary conditional probabilities 
P(a\/3), we find P(f3), which is the sum of the explana- 
tions of /?, and P(a A/?) which can be found by explaining 
a from the explanations of / 3 . Thus arbitrary conditional 
probabilities can be computed from summing the prior 
probabilities of explanations. 

It remains only to compute the prior probability of 
an explanation D of g. We assume that logical depen- 
dencies impose the only statistical dependencies on the 
hypotheses. In particular we assume: 

Assumption 2.13 Ground instances of hypotheses 
that are not inconsistent (with Ft) are probabilistically 
independent. That is, different disjoint declarations de- 
fine independent hypotheses. 

The hypotheses in a minimal explanation are always 
logically independent. The language has been carefully 
set up so that the logic does not force any dependencies 
amongst the hypotheses. If we could prove that some 
hypotheses implied other hypotheses or their negations, 
the hypotheses could not be independent. The language 
is deliberately designed to be too weak to be able to state 
such logical dependencies between hypotheses. 

Under assumption 2.13, if {hi, - ■ ■ ,h n } are part of a 
minimal explanation, then 



P{h\ A • • • A h n ) = Y[P(hi) 

1=1 

To compute the prior of the minimal explanation we mul- 
tiply the priors of the hypotheses. The posterior proba- 
bility of the explanation is proportional to this. 

The following is a corollary of lemmata 2.9 and 2.11 

Lemma 2.14 Under assumptions 2.6, 2.1, 2.8, 2.10 
and 2.13, if expl(g,T) is the set of all minimal expla- 
nations of g from theory T: 



There are four minimal explanations of a, namely 
{c, e}, {b,e}, {/, b} and {5,6}. 

The priors of the explanations are as follows: 

P(cAe) = 0.7 x 0.6= 0.42. 

Similarly P(bAe) = 0.18, P(fAb) = 0.09 and P(gAb) = 

0.03. Thus 

P{a) = 0.42 + 0.18 + 0.09 + 0.03 = 0.72 

There are two explanations of e A a, namely {c, e} and 
{b,e}. Thus P(t A a) — 0.60. Thus the conditional 
probability of e given a is P(e\a) = 0.6/0.72 = 0.833. 

What is important about this example is that all of 
the probabilistic calculations reduce to finding the prob- 
abilities of explanations. 

2.5 Tasks 

The following tasks are what we expect to implement: 

1. Generate the explanations of some goal (conjunction 
of atoms), in order. 

2. Determine the prior probability of some goal. This 
is implemented by enumerating the explanations of 
the goal. 

3. Determine the posterior probabilities of the expla- 
nations of a goal (i.e., the probabilities of the expla- 
nations given the goal). 

4. Determine the conditional probability of one for- 
mula given another. That is, determining P(a\{3) 
for any a and /?. 

All of these will be implemented by enumerating the 
explanations of a goal, and estimating the probability 
mass in the explanations that have not been enumer- 
ated. It is this problem that we consider for the next few 
sections, and then return to the problem of the tasks we 
want to compute. 



p(s) = H V «< 

\ei€expl(g,T) 

E p (“) 

ei£expl(g,T) 

2.4 An example 

In this section we show an example that we use later in 
the paper. It is intended to be as simple as possible to 
show how the algorithm works. 

Suppose we have the rules and hypotheses: 

rule( (a b, h) ) . 

rule( (a : - q, e) ) . 

rule( (q h)>. 

rule((q b,e)) . 

rule( (h : - b, f ) ) . 



3 A top-down proof procedure 

In this section we show how to carry out a best-first 
search of the explanations. In order to do this we build 
a notion of a partial proof that we can add to a priority 
queue, and restart when necessary. 

3.1 SLD-BF resolution 

In this section we outline an implementation based on 
logic programming technology and a branch and bound 
search. 

The implementation keeps a priority queue of sets 
of hypotheses that could be extended into explanations 
(“partial explanations”). At any time the set of all the 
explanations is the set of already generated explanations, 
plus those explanations that can be generated from the 
partial explanations in the priority queue. 
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Q ■■= {( 9 ^- £,{})}; 
n := {}; 

repeat 

choose and remove best ( g <— C, D) from Q ; 
if C — true 

then if good(D) then II := II U {D} endif 
else Let C — a A R 

for each rule(h *— B) where mgu(a, h) = 6 
Q := Q U {(<7 - B A R, D) 6} ; 
if a £ II and good({a} U D) 

then Q Q U {(g <— R, {a} U D)} 

endif 

endif 
until Q = {} 

where good(D) = (\/di,d 2 £ D ]dr) 6 NG3f> {d\,d 2 ) = rjf) 
A (/hr € II, 3(f) D C 7 r<j>) 



Definition 3.2 A partial explanation (g<—C,D) is 
valid with respect to { F , H) if 

F |= D A C =>• g 

Lemma 3.3 Every partial explanation in the queue Q 
is valid with respect to ( F, H ). 

Proof: This is proven by induction on the 

number of times through the loop. 

It is trivially true initially as q => q for any q. 

There are two cases where elements are added 
to Q. In the first case (the “rule” case) we know 

F \=DARAa=>g 

by the inductive assumption, and so 



Figure 1: SLD-BF Resolution to find explanations of g 
in order. 

Definition 3.1 a partial explanation is a structure 

(9-C,D) 

where g is an atom (or conjunction of atoms), C is a 
conjunction of atoms and D is a set of hypotheses. 

Figure 1 gives an algorithm for finding explanations of 
q in order of probability (most likely first). At each step 
we choose an element 

(g C, D) 

of the priority queue Q with maximum prior probability 
of D. 

We have an explanation when C is the empty conjunc- 
tion (represented here as true). In this case D is added 
to the set II of already generated explanations. 
Otherwise, suppose C is conjunction a A R. 

There are two operations that can be carried out. The 
first is a form of SLD resolution [Lloyd, 1987], where for 
each rule 

h <— b\ A • • • A b n 

in F, such that h and a have most general unifier 6 , we 
generate the partial explanation 

(g f>i A • • • A h n A R, D) 8 

and add it to the priority queue. 

The second operation is used when a € H. In this 
case we produce the partial explanation 

(g +-R, {aJUfl) 

and add it to Q. We only do this if {a} U D is consistent, 
and is not subsumed by another explanation of q. Here 
we assume the set NG of pairs of hypotheses that ap- 
pear in the same disjoint declaration (corresponding to 
nogoods in an ATMS [Reiter and de Kleer, 1987]). Un- 
like in an ATMS this set can be built at compile time 
from the disjoint declarations. 

This procedure will find the explanations in order of 
likelihood. Its correctness is based on the meaning of a 
partial explanation 



F (= ( D A R A a =>■ g)6 
We also know 

F (= (B => h)9 

As aO = h9, by a simple resolution step we have 
F\=(DAR/\B=> g)0. 

The other case is when a G H . By the induction 
step 

F\=DA(aAR)=>g 

and so 

F \= ( D A a) A R=> g 

If D only contains elements of H and a is an el- 
ement of H then [a}UD only contains elements 
of H. a 

It is now trivial to show the following: 

Corollary 3.4 Every element of II in figure 1 is an ex- 
planation of q. 

Although the correctness of the algorithm does not 
depend on which element of the queue we choose at any 
time, the efficiency does. We choose the best partial ex- 
planation based on the following ordering of partial ex- 
planations. Partial explanation (gi <— Ci,Di) is better 
than (#2 <— C 2 ,D 2 ) if P(Di) > P(Df). It is simple to 
show that “better than” is a partial ordering. When we 
choose a “best” partial explanation we choose a minimal 
element of the partial ordering; where there are a number 
of minimal partial explanations, we can choose any one. 
When we follow this definition of “best”, we enumerate 
the minimal explanations of q in order of probability. 

3.2 Our example 

In this section we show how the simple example in Sec- 
tion 2.4 is handled by the best-first proof process. 

The following is the sequence of values of Q each time 
through the loop (where there are a number of mini- 
mal explanations, we choose the element that was added 
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last): 

{(«<-«,{}>} 

{(a +— b A h, {}) , (a +— q A e, {})} 

{(a 4 - q Ae,{j),(a <- *,{&})} 

{(a 4 - h A e, {}) , (a *- b A e A e, {}) ,(a*-h, { 6 })} 

{(a 4 — b A f A e, {}) , (a <— c A e A e, {}) , 

(a «- g A b A e, {}) , (a <- b A e A e, {}) , (a <- h, {6})} 
{(a *— c A e A e, {}) , (a 4 — g A b A e, {}) , 

(a <— b A e A e, {}) , (a <— / A e, {6}) , (a <— h, {6})} 

{(a g A b A e, {}) , (a 4 — 6 A e A e, {}) , (a <-eAe, {c}} , 
(a <- / A e, {b}) ,(a+-h, {6})} 

{(a <— 6 A e A e, {}) , (a <— e A e, {e}) , (a *— f A e, {6}) , 

(a 4 - h, { 6 }) ,(a<-b Ae, {(/})} 

{(a <— e A e, {c }) , (a *— e A e, {6}) , (a <— / A e, {6}) , 

(a <— h, {6}) , (a f-iAe, {tf})} 

{(a e, {e,c}) , (a <— e A e, {6 }} , (a *— f A e, {6}) , 

(a 4 - /*,{&}), (a 4 - 6 Ae,{fif})} 

{(a <— true , {e, c}) , (a ■<— e A e, {6}) , (a <— f A e, {6}) , 

(a /i, {6 }) , (a <— b A e, {s})} 

Thus the first, and most likely explanation is {e,c}. 

{(a +— e A e, {6}) , (a <— / A e, {6}) ,(a<-h, {6}) , 

(a 4 - 6 Ae,{j})} 

(a +— f Ae, {6}) , <a <— /», {6}) ,(a*-e, {e, 6}) , 

{{a ^-5Ae,{(/})} 

{{« «- /», {&}) ,(a*~e, {e, &}) , (a <— 6 A e, {</}) , 

(a 4 - e, {/, 6})} 

{{a 4 - 6 A /, {6}) , (a <— c A e, {6}) ,{a*-gAb, {6}) , 

{a e, {e, b}) , (a *- b A e, {g }) , (a 4 - e, {/, 6})} 

{(a *- f, {&}) , (a 4 - c A e, {&}) ,(a*-gAb, {6}) , 

(a 4 - e, {e, b}) , (a 4- 6 A e, {g}) {/, b})} 

{{a 4 - c A e, { b }) , (a <— g Ab, {6 }) , (a *- e, {e, 6}) , 

(a 4 - b A e, {g}) , (a 4 - true, { f , b}) , (a <- e, {/, b})} 

Here the algorithm effectively prunes the top partial 
explanation as (c, b ) forms a nogood. 

{(a *-g Ab,{b}) ,{a 4 - e,{e, 6 }) , (a 4 - i»Ae,{j}) , 

(a 4 - true, {/, 6 }) , (a <- e, {/, 6})} 

{(a 4 - e, {e, b}) , (a 4 - b A e, {g}) , (a 4 - true, {/, 6}) , 

(a 4 - e, {/,&}) , (a 4 - 6, {(/,&})}} 

{(a 4 — true, {e, 6}) , (a 4 — 6 A e, { 5 }) , (a 4 — true, {/, 6}) , 
(a 4 - e, {/,&}), (a 4 - 6, {<jr, 6})} 

We have now found the second most likely explana- 
tion, namely {e, b}. 



4 Discussion 

4.1 Probabilities in the queue 

We would like to give an estimate for P(g ) after having 
generated only a few of the most likely explanations of g, 
and get some estimate of our error. This problem reduces 
to estimating the probability of partial explanations in 
the queue. 

If (g 4 — C, D) is in the priority queue, then it can pos- 
sibly be used to generate explanations Di, ■ ■ ■ , D n . Each 
Di will be of the form We can place a bound on 

the probability mass of all of the Di, by 

P(D 1 V • • • V £>„) - P(DA{D[V...VD' n )) 

< P(D) 

Given this upper bound, we can determine an upper 
bound for P(g), where {ei, • • •, e n } is the set of all min- 
imal explanations of g: 

P{g) = P(ei Ve 2 V • • • V e n ) 

= P(ei) + P(e 2 )H f P(e„) 

E W) + ( E p m ) 

ei found / \ej to be generated / 

We can easily compute the first of these sums, and can 
put upper and lower bounds on the second. This means 
that we can put a bound on the range of probabilities of 
a goal based on finding just some of the explanations of 
the goal. Suppose we have goal g, and we have generated 
explanations n. Let 

Pn = E p (°) 

De n 

Pq= E p ( D ) 

D:{g*—C,D)£Q 

where Q is the priority queue. 

We then have 

Pn < P(g) < Pn + Pq 

As the computation progresses, the probability mass 
in the queue Pq approaches zero 1 and we get a better 
refinement on the value of P(g). This thus forms the 
basis of an “anytime” algorithm for Bayesian networks. 

4.2 Conditional Probabilities 



{(a 4— 6 A e, {(/}) , (a 4- true, {/, b}} , (a <- e, {/, 6 }) , 
(a 4- b,{g,b})} 

{(a 4 - true, {/, 6}) , (a *- e, {/, b }) , (a *- e, {g, 6}) , 

(a 4- b, {< 7 , 6})} 

We have thus found the third explanation {/, 6). 

{(a 4- e, {/, b}) , (a 4 - e , {g, 6}) ,(a*-b, {g, 6})} 

{(a 4— e, {g, 6 }) ,(a*-b, {g, 6 })} 

{(a 4- b,{g,b})} 

{(a 4 - true,{g,b})} 

The fourth explanation is {<?,&}. There are no more 
partial explanations and the process stops. 



We can also use the above procedure to compute condi- 
tional probabilities. Suppose we are trying to compute 
the conditional probability P(a\f3). This can be com- 
puted from the definition: 



P{«\P) 



P{oc A (3) 

W) 



We compute the conditional probabilities by enumer- 
ating the minimal explanations of a A (3 and (3. Note that 
the minimal explanations of a A (3 are explanations (not 



1 Note that the estimate given above does not always de- 
crease. It is possible that the error estimate increases. [Poole, 
1992b] considers cases where convergence can be guaranteed. 
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necessarily minimal) of (3. We can compute the explana- 
tions of a A f3, by trying to explain a from the explana- 
tions of (3. The above procedure can be easily adapted 
for this task, by making the task to explain j3 A a, and 
making sure we prove [3 before we prove <a, so that we 
can collect the explanations of (3 as a we generate them. 
Let PP be the sum of the probabilities of the explana- 
tions of (3 enumerated, and let P aA P be the sum of the 
explanations of a A f3 generated. 

Thus given our estimates of P(a A (3) and P(f3 ) we 
have 



pa/yfl 

PP + Pq 



< P(<* \/3) < 



P a ^ + Pq 
pp 



The lower bound is the case where all of the partial de- 
scriptions in the queue go towards worlds implying (3, 
but none of these also lead to a. The upper bound is the 
case where all of the elements of the queue go towards 
implying a, from the explanations already generated for 



4.3 Consistency and subsumption checking 

One problem that needs to be considered is the prob- 
lem of what happens when there are free variables in 
the hypotheses generated. When we generate the hy- 
potheses, there may be some instances of the hypotheses 
that are inconsistent, and some that are consistent. We 
know that every instance is inconsistent if the subgoal is 
subsumed by a nogood. This can be determined by sub- 
stituting constants for the variables in the the subgoal, 
and finding if a subset unifies with a nogood. 

We cannot prune hypotheses because an instance is in- 
consistent. However, when computation progresses, we 
may substitute a value for a variable that makes the par- 
tial explanation inconsistent. This problem is similar to 
the problem of delaying negation-as-failure derivations 
[Naish, 1986], and of delaying consistency checking in 
Theorist [Poole, 1991a]. We would like to notice such 
inconsistencies as soon as possible. In the algorithm of 
Figure 1 we check for inconsistency each time a par- 
tial explanation is taken off the queue. There are cases 
where we do not have to check this explicitly, for exam- 
ple when we have done a resolution step that did not 
assign a variable. There is a trade-off between checking 
consistency and allowing some inconsistent hypotheses 
on the queue 2 . This trade-off is beyond the scope of this 
paper. 

Note that the assumptions used in building the system 
imply that there can be no free variables in any explana- 
tion of a ground goal (otherwise we have infinitely many 
disjoint explanations with bounded probability). Thus 
delaying subgoals eventually grounds all variables. 



same as the iterative deepening version of A* with the 
heuristic function of zero [Korf, 1985]. The algorithm of 
procedure 1 is given at a level of abstraction which does 
not preclude iterative deepening. 

For our experimental implementations, we have used 
an interesting variant of iterative deepening. Our queue 
is only a “virtual queue” and we only physically store 
partial explanations with probability greater than some 
threshold. We remember the mass of the whole queue, 
including the values we have chosen not to store. When 
the queue is empty, we decrease the threshold. We can 
estimate the threshold that we need for some given accu- 
racy. This speeds up the computation and requires less 
space. 

4.5 Recomputing subgoals 

One of the problems with the above procedure is that 
it recomputes explanations for the same subgoal. If s is 
queried as a subgoal many times then we keep finding 
the same explanations for s. This has more to do with 
the notion of SLD resolution used than with the use of 
branch and bound search. 

We are currently experimenting with a top-down pro- 
cedure where we remember computation that we have 
computed, forming “lemmata” . This is similar to the use 
of memo functions [Sterling and Shapiro, 1986] or Earley 
deduction [Pereira and Shieber, 1987] in logic program- 
ming, but we have to be very careful with the interac- 
tion between making lemmata and the branch and bound 
search, particularly as there may be multiple answers to 
any query, and just because we ask a query docs not 
mean we want to solve it (we may only want to bound 
the probability of the answer). 

4.6 Bounding the priority queue 

Another problem with the above procedure that is not 
solved by lemmatisation is that the bound on the prior- 
ity queue can become quite large (i.e., greater than one). 
Some bottom-up procedures [Poole, 1992b], can have an 
accurate estimate of the probability mass of the queue 
(i.e., an accurate bound on how much probability mass 
could be on the queue based on the information at hand). 
See [Poole, 1992b] for a description of a bottom-up pro- 
cedure that can be compared to the top-down procedure 
in this paper. In [Poole, 1992b] an average case analysis 
is given on the bottom-up procedure; while this is not 
an accurate estimate for the top-down procedure, the 
case where the bottom-up procedure is efficient [Poole, 
1992b] is the same case where the top-down procedure 
works well; that is where there are normality conditions 
that dominate the probability of each hypothesis (i.e., 
where all of the probabilities are near one or near zero). 



4.4 Iterative deepening 

In many search techniques we often get much better 
space complexity and asymptotically the same time com- 
plexity by using an iterative deepening version of a 
search procedure [Korf, 1985], An iterative deepening 
version of the best-first search procedure is exactly the 

2 We liave to check the consistency at some time. This 
could be as late as just before the explanation is added to II. 



5 Comparison with other systems 

There are many other proposals for logic-based abduc- 
tion schemes (e.g., [Pople, 1973; Cox and Pietrzykowski, 
1987; Goebel et ai, 1986; Poole, 1987]). These, however, 
consider that we either find an arbitrary explanation or 
find all explanations. In practice there are prohibitively 
many of these. It is also not clear what to do with all 
of the explanations; there are too many to give to a 
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user, and the costs of determining which of the expla- 
nations is the “real” explanation (by doing tests [Sattar 
and Goebel, 1991]) is usually not outweighed by the ad- 
vantages of finding the real explanation. This is why 
it is important to take into account probabilities. We 
then have a principled reason for ignoring many expla- 
nations. Probabilities are also the right tool to use when 
we really are unsure as to whether something is true or 
not. For evidential reasoning tasks (e.g., diagnosis and 
recognition) it is not up to us to decide whether some 
hypothesis is true or not; all we have is probabilities 
and evidence to work out what is most likely true. Simi- 
lar considerations motivated the addition of probabilities 
to consistency-based diagnosis [de Kleer and Williams, 
1989]. 

Perhaps the closest work to that presented here is that 
of Stickel [Stickel, 1988]. His is an iterative deepening 
search for the lowest cost explanation. He does not con- 
sider probabilities. 

6 Using existing logic programming 
technology 

In this section we show how the branch and bound search 
can be compiled into Prolog. The basic idea is that when 
we are choosing a partial explanation to explore, we can 
choose any of those with maximum probability. If we 
choose the last one when there is more than one, we 
carry out a depth-first search much like normal Prolog, 
except when making assumptions. We only add to the 
priority queue when making assumptions, and let Prolog 
do the searching when we are not. 

6.1 Remaining subgoals 

Consider what subgoals remain to be solved when we are 
trying to solve a goal. Consider the clause: 

h <— &i A 62 A • • • A b m . 

Suppose R is the conjunction of subgoals that remain 
to be solved after h in the proof. If we are using the 
leftmost reduction of subgoals, then the conjunction of 
subgoals remaining to be solved after subgoal 6 t - is 

bi + 1 A • • • A A R 

The total information of the proof is contained in the 
partial explanation at the point we are in the proof, i.e., 
in the remaining subgoals, current hypotheses and the 
associated answer. The idea we exploit is to make this 
set of subgoals explicit by adding an extra argument to 
each atomic symbol that contains all of the remaining 
subgoals. 

6.2 Saving partial proofs 

There is enough information within each subgoal to 
prove the top level goal it was created to solve. When we 
have a hypothesis that needs to be assumed, the remain- 
ing subgoals and the current hypotheses form a partial 
explanation which we save on the queue. We then fail 
the current subgoal and look for another solution. If 
there are no solutions found (i.e., the top level computa- 
tion fails), we can choose a saved subgoal (according to 
the order given in section 3.1), and continue the search. 



Suppose in our proof we select a possible hypothesis 
h of cost P({h}) with U being the conjunction of goals 
remaining to be solved, and T the set of currently as- 
sumed hypotheses with cost P(T). We only want to 
consider this as a possible contender for the best solu- 
tion if P({h} U T ) is the minimal cost of all proofs being 
considered. The minimal cost proofs will be other proofs 
of cost P(T). These can be found by failing the current 
subgoal. Before we do this we need to add U, with hy- 
potheses {h}uT to the priority queue. When the proof 
fails we know there is no proof with the current set of 
hypotheses; we remove the partial proof with minimal 
cost from the priority queue, and continue this proof. 

We do a branch and bound search over the partial 
explanations, but when the priorities are equal, we use 
Prolog’s search to prefer the last added. The overhead on 
the resolution steps is low; we only have to do a couple 
more simple unifications (a free variable with a term). 
The main overhead occurs when we reach a hypothesis. 
Here we store the hypotheses and remaining goals on 
a priority queue and continue or search by failing the 
current goal. This is quick (if we implement the priority 
queue efficiently); the overhead needed to find all proofs 
is minimal. 

Appendix A gives code necessary to run the search 
procedure. 

7 Conclusion 

This paper has considered a logic programming approach 
that uses a mix between depth-first and branch-and- 
bound search strategies for abduction where we want 
to consider probabilities, and only want to generate the 
most likely explanations. The underlying language is 
a superset of pure Prolog (without negation-as- failure), 
and the overhead of executing pure Prolog programs is 
small. 

A Prolog interpreter 

This appendix gives a brief overview of a meta- 
interpreter. Hopefully it is enough to be able to build 
a system. Our implementation contains more bells and 
whistles, but the core of it is here. 

A.l Prove 

prove(G, To, T u Co, C u U) 

means that G can be proven with current assumptions 
To, resulting in assumptions T\, where Ci is the proba- 
bility of Ti, and U is the set of remaining subgoals. 

The first rule defining prove is a special purpose rule 
for the case where we have found an explanation; this 
reports on the answer found. 

prove(ans(A) ,T,T,C,C,_) !, 

ans(A,T,C) . 

The remaining rules are the real definition, that follow 
a normal pattern of Prolog meta-interpreters [Sterling 
and Shapiro, 1986]. 

prove(true,T,T,C,C,_) !. 
prove ( (A , B) , TO ,T2 , CO , C2 ,U) !, 
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prove(A,TO,Tl,CO,Cl,(B,U)), 
prove (B , T1 ,T2 , Cl , C2 ,U) . 
prove(H,T,T,C,C,_) 
hypothesis (H , PH) , 
member (H,T) , ! . 
prove(H,T, [HIT] ,C,C1,U) 
hypothesis (H, PH) , 

\+ (( member(Hl ,T) , makeground( (H ,H1) ) , 
nogood(H,Hl) )), 

Cl is C*PH , 

add_to_PQ (process ( [H |T] ,C1,U)) , 
fail . 

prove (G , TO ,T1 , CO ,C1 ,U) 
rul(G,B), 

prove (B,T0,T1,C0,C1,U) . 

A. 2 Rule and disjoint declarations 

We specify the rules of our theory using the declaration 
rule(R) where R is the form of a Prolog rule. This asserts 
the rule produced. 

rule((H B)) ! , 

assert(rul(H,B) ) . 
rule(H) : - 

assert(rul(H,true) ) . 

The disjoint declaration forms nogoods and declares 
probabilities of hypotheses. 

: - op( 500 , xf x, : ) . 
disjoint ( [] ) . 
dis joint ( [H : P | R] ) 

assert (hypothesis (H, P) ) , 

make_disjoint(H,R) » 

dis joint(R) . 

make_dis joint (_, [] ) . 
make_dis joint (H, [H2 : _ I R] ) 
assert(nogood(H,H2)) , 
assert(nogood(H2,H)) , 
make_disjoint(H,R) . 

A. 3 Explaining 

To find an explanation for a subgoal G we execute 
explain(G). This creates a list of solved explanations 
and the probability mass found (in “done”), and creates 
an empty priority queue. 

explain(G) 

assert (done( [] , 0) ) , 
initq , 

ex( (G , ans (G) ) , [] ,1), ! . 

ex(G, D, C ) tries to prove G with assumptions D such 
that probability of D is C. If G cannot be proven, a par- 
tial proof is taken from the priority queue and restarted. 
This means that ex(G, D, C ) succeeds if there is some 
proof that succeeds. 

ex(G,D,C) : - 

prove(G,D ,_,C,_,true) . 
ex(_,_,_) : - 

remove_from_PQ (process (D , C ,U) ) , ! , 
ex(U,D,C) . 



We can report the explanations found, the estimates 
of the prior probability of the hypothesis, etc, by defin- 
ing ans(G, D,C), which means that we have found an 
explanation D of G with probability C. 
ans (G , [] ,_) 

vriteln([G,’ is a theorem. ’]),!. 
ans(G,D,C) 

allgood(D) , 
qmass(QM) , 

retract (done(Done, DC) ) , 

DC1 is DC+C, 

assert(done( [expl(G,D,C) |Done] ,DC1) ) , 

TC is DC1 + QH, 

writeln( [’Probability of ’,G, 

’ = [’ ,DC1 , ’ , ’ ,TC, ’]’]), 

Prl is C / TC, 

Pr2 is C / DC1 , 

writeln( [’Explanation: ’ ,D] ) , 

writeln( [’Prior = ’,C]), 

writeln( [’Posterior = [’ ,Prl , ’ , ’ ,Pr2, ’] ’] ) . 

more is a way to ask for more answers. It will take 
the top priority partial proof and continue with it. 

more ex(f ail,_,_) . 

A. 4 Auxiliary relations used 

The following relations were also used. They can be 
divided into those for managing the priority queue, and 
those for managing the nogoods. 

We assume that there is a global priority queue into 
which one can put formulae with an associated cost and 
from which one can extract the least cost formulae. We 
assume that the priority queue persists over failure of 
subgoals. It can thus be implemented by asserting into 
a Prolog database, but cannot be implemented by carry- 
ing it around as an extra argument in a meta-interpreter 
[Sterling and Shapiro, 1986], for example. We would like 
both insertion and removal from the priority queue to be 
carried out in log n time where n is the number of ele- 
ments of the priority queue. Thus we cannot implement 
it by having the queue asserted into a Prolog database 
if the asserting and retracting takes time proportional 
to the size of the objects asserted or retracted (which it 
seems to in the implementations we have experimented 
with). 

Four operations are defined: 
initQ 

initialises the queue to be the empty queue, with zero 
queue mass. 

addJo-PQ(process(D, C, U)) 

adds assumption set D, with probability C and remain- 
ing subgoals U to the priority queue. Adds C to the 
queue mass. 

remove.from.PQ(process(D , C , [/)) 

if the priority queue is not empty, extracts the ele- 
ment with highest probability (highest value of C ) from 
the priority queue and reduces the queue mass by C. 
remove. from.PQ fails if the priority queue is empty. 

qmass(M) 
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returns the sum of the probabilities of elements of the 
queue. 

We assume the relation for handling nogoods: 
allgood(L) 

fails if L has a subset that has been declared nogood. 
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Abstract 

Equality can be added to logic programming by using 
surface deduction. Surface deduction yields interpreta- 
tions of unification failures in terms of residual hypothe- 
ses needed for unification to succeed. It can therefore 
be used for abductive reasoning with equality. In sur- 
face deduction the input clauses are first transformed to 
a flat form (involving no nested terms) and symmetrized 
(if necessary). They are then manipulated by binary 
resolution, a restricted version of factoring and compres- 
sion. The theoretical properties of surface deduction, 
including refutation completeness and weak deductive 
completeness properties (relative to equality), are estab- 
lished in [Cox et al. 1991]. In this paper we show that 
these properties imply that an enhancement of surface 
deduction will yield all parsimonious hypotheses when 
used as an abductive inference engine. The character- 
ization of equational implication for goal clauses given 
in [Cox et al. 1991] is shown to yield a uniquely defined 
equationally equivalent residuum for every goal clause. 
The residuum naturally represents the corresponding ab- 
ductive hypothesis. An example illustrating the use of 
surface deduction in abductive reasoning is presented. 

1 Introduction 

In abductive reasoning, the task is to explain a 
given observation by introducing appropriate hypotheses 
([Cox and Pietrzykowski 1987], [Goebel 1990]). Most 
presentations of abduction do not include reasoning with 
equality, nor do they allow the introduction of equal- 
ity assumptions to explain an observation. A notable 
exception is E. Charniak’s work on motivation analy- 
sis [Charniak 1988]. Charniak allows the introduction of 
certain restricted equality assumptions to determine mo- 
tivations for observed actions. He shows that the intro- 
duction of such equality assumptions is required to suc- 
cessfully abduce motivations. In this paper we consider 
the problem of abductivh reasoning with Horn clauses in 
the presence of equality. We show that surface deduc- 
tion has the necessary properties for use in an abductive 



inference system provided that the input theory contains 
the function substitutivity axioms. 

In the presence of equality, an abduction problem 
consists of a theory T and a formula O (the observation). 
An explanation of (0,T) is a formula E consistent with 
T such that E together with T equationally implies 0. 
We will assume that 0 and E are existentially quantified 
conjunctions of facts and that T is a Horn clause theory. 

One way to obtain an explanation E, given an obser- 
vation O and a theory T, is to deduce -> E from T and 
-> 0. Since explanations with less irrelevant information 
are preferred (the parsimony principle), it is sufficient to 
deduce a clause ~'E' such that ->E f implies ->E. Intu- 
itively, E' is at least as good an explanation as E (see 
Section 4). It follows that a deduction system adequate 
for abductive reasoning should satisfy a weak deductive 
completeness: If the theory T implies a non- tautological 
clause -iE, then we must be able to deduce a clause -i E' 
from T such that ->E' implies ->E. In the absence of 
equality, SLD-resolution (see [Lloyd 1984]) satisfies this 
condition. 

The problem of introducing equality to Horn clause 
logic has been well-studied, see [Holldobler 1989] for an 
excellent overview. The simplest approach to this prob- 
lem involves adding the equality axioms (which are Horn 
clauses) to the set of input clauses. However, unre- 
stricted use of these axioms results in inefficiency. Fur- 
thermore, this approach does not yield any insights into 
the degree to which the equality axioms are needed. 
Paramodulation and other term rewriting systems do 
not explicitly introduce new equality assumptions into 
derivations and therefore do not satisfy the weak deduc- 
tive completeness condition. Other approaches, such as 
the ones in [van Emden and Lloyd 1984] and extended 
in [Hoddinott and Elcock 1986] using the homogeneous 
form of clauses, require restricting the form of the input 
theory. Here, we use the results of [Cox et al. 1991] to 
show that if equality is introduced to Horn clause logic 
via surface deduction with the function substitutivity ax- 
ioms, then all preferred explanations for an abduction 
problem can be obtained. The need for axioms of equal- 
ity other than function substitutivity is thus eliminated. 




540 



In surface deduction, a set of input clauses is first 
transformed to a flat form and symmetrized. The deduc- 
tion then proceeds using linear input resolution for Horn 
clauses (see [Lloyd 1984]) together with a limited use of 
factoring and a new rule called compression. The addi- 
tional deduction rules are equivalent to those restricted 
uses of the reflexivity axiom (a; = x ) which preserve 
flatness. They are required only at the end of a deduc- 
tion. 

A clause is flat if it has no nested functional expres- 
sions, and every variable which appears immediately to 
the right of an equality symbol (=) appears only in such 
positions. A stronger version of flatness requires that in 
addition the clause is separated. This means that every 
variable appears at most once in any given literal and has 
only one occurrence inside a functional or relational ex- 
pression. Symmetrization affects only those clauses with 
equalities in their heads (see Section 3). 

The idea of using flattening to add equality to the- 
orem proving is due to [Brand 1975] and is applied 
to logic programming in [Cox and Pietrzykowski 1986] 
where surface deduction is defined. Flattening is 
closely related to narrowing. In narrowing the pro- 
cess of flattening is implicit in the deduction rules. 
The relationship between the two methods is exam- 
ined in [Bosco et al. 1988]. Separation of terms is im- 
plicit in the transformations to the homogeneous forms 
of [Hoddinott and Elcock 1986]. The symmetrization 
method used here is similar to the one introduced in 
[Chan 1986] and does not increase the number of clauses 
in the theory. 

In [Cox et al. 1991] it is shown that surface deduction 
satisfies a weak deductive completeness provided that the 
input clauses are first transformed to separated form. As 
an application of this result, equational implication for 
goal clauses is found to have a simple syntactic charac- 
terization analogous to subsumption. 

Once an explanation E is obtained by surface deduc- 
tion, in what form should E be presented? For example 
if -i E (the actual clause deduced) is given by 

x = a, y = b, y = c, 

then y = b,y = c is equationally equivalent to ->E. 
Therefore the atom i = a is irrelevant and should be 
removed. In Section 4 it is shown that the character- 
ization of equational implication for goal clauses given 
in [Cox et al. 1991] implies that for every goal clause C 
there is a uniquely defined equational residuum RES(C) 
which cannot be further reduced without weakening 
the corresponding explanation. The notion of equa- 
tional residuum is related to that of prime implicates 
used in switching theory [Kohavi 1978], truth mainte- 
nance systems [Reiter and de Kleer 1987] and diagnoses 
[de Kleer et al. 1988]. RES(C) is an equational prime 
implicate of a flattening of C. 



In Section 2 the terminology is established; in Sec- 
tion 3 surface deduction is defined and the completeness 
results needed for abductive reasoning are given. In Sec- 
tion 4 the formalism of abductive reasoning with surface 
deduction is discussed; and finally in Section 5 an exam- 
ple is presented of an abductive problem solved by using 
surface deduction. 

2 Preliminaries 

Familiarity with logic programming is assumed (see 
e.g. [Lloyd 1984]). As in [Holldobler 1990], let = denote 
the equality predicate symbol. The usual equality sym- 
bol = is used exclusively for syntactic equality. If L is 
an atom and C = { M x , . . . , M n } is a set of atoms, then 
L C denotes the Horn clause L V -<M 1 V . . . ~'M n . In 
this expression, L is the head and C is the body of the 
clause. A clause of the form C is a goal clause. The 
atoms of C are the subgoals of C . A clause of the form 
L is a fact. If Cj, . . . , C n are sets of atoms and C is 
the union of the C,-, then L C lt . . . , C n means L C. 
When possible, set notation is omitted for one-element 
sets. 

If OP is an operation which maps clauses to clauses 
and A is a set of clauses, then OP(.4) = {OP(C) | C € 
*4}. Let a be a substitution. If aqcr = for i = 1 , . . . ,n 
and xcr = x for all other variables, then a is denoted by 
(xq x n t n }. A substitution a is variable-pure 

iff xa is a variable for every variable x. 

The expression ‘most general unifier’ is abbreviated 
by ‘mgu’. An equality is an atom of the form s = t. Let 
£ be the set of equality axioms other than x = x . If 
A and B are sets of clauses, then A satisfies (or implies) 
B iff every model of A is a model of B. A equationally 
satisfies (or implies) B iff A U £ U {x = x } satisfies B. 
A and B are (equationally) equivalent iff each (equation- 
ally) satisfies the other. A is equationally inconsistent iff 
A equationally implies the empty clause. 

3 Surface Deduction 

In surface deduction, a refutation of a set of input clauses 
proceeds by first transforming the input clauses to a flat 
form and then refuting the result using resolution, fac- 
toring and compression. The transformation subsumes 
the eqiiality axioms other than reflexivity. The rules of 
factoring and compression subsume reflexivity. 

Definition. Let C be a clause and t a term. An occur- 
rence of t on the left-hand side (right-hand side) of an 
equality t = s (s = t) in C is a root ( surface ) occurrence 
of t in C . Every other occurrence of t is an internal oc- 
currence of t. The term t is a root term of C iff it has 
a root occurrence in C. Surface and internal terms are 
defined analogously. 
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Definition. A clause C is flat iff 

(i) every atom of C is of the form P(x 1 , . . . ,x n ), 
X = f(x u . . . , x n ) or X = y, and 

(ii) no surface variable of C is a root or internal 
variable of C. 

Definition. Let C be a Horn clause. An elementary 
flattening of C is obtained by either 

(i) replacing some of the non-surface occurrences 
of a non-variable term t by a new variable y and 
adding the equality y = t to the body, 

or 

(ii) replacing some of the surface occurrences of a 
root or internal variable x of C by a new variable 
y and adding the equality x = y to the body. 

An elementary flattening of the set of clauses A is ob- 
tained by replacing a clause in A by an elementary flat- 
tening of that clause. 

Modifying a clause C by successive elementary flat- 
tenings eventually results in a flat clause (a flattening of 
C ) which cannot be flattened any further (Theorem 2 
of [Cox and Pietrzykowski 1986]). 

Definition. Let C be a clause. Then FLAT(C) denotes 
a (arbitrary but fixed) flattening of C . 

For any set of clauses A, FLAT(M) is equationally 
equivalent to A. In [Cox et al. 1991] it is shown that for 
refutation completeness the transformation FLAT sub- 
sumes the substitutivity axioms but not transitivity and 
symmetry. 

In order to subsume transitivity and symmetry, we 
need another transformation. 

Definition. Let C be a clause with an equality in its 
head. Then C is symmetric iff C is of the form 

x = u x = v,s = v,y = u,y = t, M 

for some terms s and t and set of atoms M, where x,y, 
u and v do not occur in M, s or t. The set of clauses A 
is symmetrized iff every clause C of A with an equality 
in its head is symmetric. 

Definition. Let C be a Horn clause. If C does not 
have an equality in its head or if C is symmetric, then the 
symmetrization SYM(C) of C is C. If C is not symmetric 
and of the form s = t M, then SYM(C) is given by 

x = u x = v,s = v,y = u,y = t,M. 

Note that if A is a set of Horn clauses, then SYM(M) 
is equationally equivalent to A , and if A is flat, then 
SYM(M) is fiat. In [Cox et al. 1991] it is shown that 



the transformation SYM subsumes transitivity and sym- 
metry. In order to subsume substitutivity, transitivity 
and symmetry, the transformations SYM and FLAT are 
composed. 

Flattening and symmetrization followed by SLD- 
resolution using resolution with x = x as an additional 
deduction rule is refutation complete for logic program- 
ming with equality. However, weak deductive complete- 
ness is not satisfied [Cox et al. 1991]. In order to obtain 
weak deductive completeness an additional transforma- 
tion is required. 

Definition. A positive (negative) root occurrence of 
the term t in the clause C is a root occurrence in the 
head (body) of C. 

Definition. The flat clause C is separated in the vari- 
able x iff 

(i) every literal of C has at most one occurrence of 
x, 

(ii) C has at most one internal occurrence of x, and 

(iii) if x has an internal occurrence in C , then x has 
a negative root occurrence in C. 

The clause C is separated iff C is separated in all its 
variables. 

If A is a set of separated flat Horn clauses, then 
SYM(.4) is separated. Separated clauses can be obtained 
from a given flat clause by using the transformation SEP: 

Definition. Let C be a flat clause and x a variable. 
The clause SEP(C) is the separated flat clause obtained 
by applying the following transformation to C: For every 
variable x such that C is not separated in x, replace each 
internal occurrence of x by a new variable x± and add 
the equalities x = y, aq = ?/, x 2 = j/, . . . to the body of C 
(where y is a new surface variable). 

The rules of factoring and compression used in surface 
deduction are: 

(i) Root factoring. The clause C' is a root factor of C 
iff C' is obtained by factoring two equalities of C 
with the same root variable. 

(ii) Surface factoring. The clause C' is a surface factor 
of C iff C' is obtained by factoring two equalities 
of C with the same surface term. 

(iii) Root compression. The clause C' is a root compres- 
sion of C iff C' is obtained by removing an equality 
x = t from the body of (7, where x has only one 
occurrence in C. 

(iv) Surface compression. The clause C' is a surface 
compression of C iff C' is obtained by removing an 
equality x = y from the body of C, where y has 
only one occurrence in C. 
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A compression is a root or surface compression. A com- 
pression of a clause C is a clause C' obtained from C by 
a sequence of applications of compression rules. 

The soundness of root and surface factoring and 
compression (in the presence of equality) is shown 
in [Cox and Pietrzykowski 1986]. Observe that binary 
resolution, surface and root factoring and compres- 
sion preserve flatness. The relationship between fac- 
toring, compression and resolution with the reflexiv- 
ity axiom is determined by the following result (proved 
implicitly in [Cox and Pietrzykowski 1986] and explic- 
itly in [Cox et al. 1991]; see also [Hoddinott and Elcock 
1986]): 

Theorem 3.1 Let • - C be a flat goal clause. If C' 
is a flat goal clause obtained from C by a sequence 
of binary resolutions with x = x , then C' can be 
obtained from C by a sequence of root and surface 
factorings and compressions. 

Definition. Let A be a set of flat Horn clauses. The 
flat goal clause C is S-deducible from A ifF C can be 
obtained from A by a sequence of binary resolutions, 
surface and root factorings and compressions. Note that 
we can assume that the deduction is linear. A is S- 
refutable iff the empty clause is .S-deducible from A. 

To state the weak deductive completeness result for 
flat, separated and symmetrized clauses, we need the 
transformation defined next. 

Definition. Let C be a flat goal clause. Then C 
is reduced iff C has no surface variables and no two 
equalities of C have the same right-hand sides. A flat 
reduced clause REDU( C) is obtained from C by 
factoring equalities with identical right-hand sides un- 
til all right-hand sides are distinct, and by removing 
all remaining equalities with surface variables by surface 
compression. Note that for every flat goal clause C, 
REDU( C ) is equationally equivalent to C . 

Theorem 3.2 [Cox et al. 1991] Let C be a goal 
clause and A a set of Horn clauses which includes 
the function substitutivity axioms. Then A equa- 
tionally implies C iff there is a flat goal clause 

C' such that for some variable-pure substihition a, 
:-C'a C REDU(FLAT( C)) and C' is S-deducible 
from SYM(SEP(FLAT(*4))). 

As an application of this result, the following theorem 
is proved in [Cox et al. 1991]: 

Theorem 3.3 Let A and B be goal clauses. Then 

A equationally implies B iff there is a variable-pure 
substitution <j such that a compression o/FLAT( A)a 
is included in REDU(FLAT( B)). 



Definition. Let :- C be a goal clause. An equa- 
tional residuum of :- C is a minimal subclause of 
REDU(FLAT( :- C )) which is equationally equivalent to 
:-C. 

Every equational residuum of :- C is equationally 
equivalent to C . The fact that every subclause of a 

reduced clause is reduced implies that if :- C' is an equa- 
tional residuum of C, then C' is reduced. The next 
theorem shows that the equational residuum is unique. 

Theorem 3.4 [Cox et al. 1991] Let '-A' and B' be 

equational residua of the goal clauses A and B re- 

spectively. Then A is equationally equivalent to B 
iff A' is a variant of B' . 

4 Abduction using Surface De- 
duction 

An existential conjunction of facts is a conjunction of 
facts with all its free variables quantified existentially. 
The abduction problem for Horn clause logic with equal- 
ity can be stated as follows: 

Abduction Problem: An abduction problem is a pair 
(A, O), where A is a theory of Horn clauses and 0 (the 
observation ) is an existential conjunction of facts. An 
explanation of the abduction problem ( A,0 ) is an ex- 
istential conjunction of facts E consistent with A such 
that E and A equationally imply O. 

Let ->0 and ~>E denote the disjunctions of the nega- 
tions of the constituent facts of O and E respectively. 
Since E and A equationally imply O iff ~>0 and A equa- 
tionally imply ->E, a solution to an abduction problem 
can be obtained by deducing a clause C from A and ->0, 
and negating C to obtain E. 

In general, it is desirable for an explanation E of 
an abductive problem (.4,0) to have certain additional 
properties (see [Cox and Pietrzykowski 1987]). For ex- 
ample, an explanation E should not contain any facts 
not required to yield the observation from A (the par- 
simony principle). Thus if E and E' are explanations 
of (A, 0) and E equationally implies E E' is preferred 
over E. (Here ‘preferred’ is to be understood as ‘at least 
as good as’.) 

For abduction, a desirable property of a deduction 
system is that for every explanation E of an abductive 
problem ( A , 0), one can obtain an explanation preferred 
over E. The weak completeness result of Theorem 3.2 
implies that surface deduction with separated clauses 
and the function substitutivity axioms has this property. 

Theorem 4.1 Let ( A,0 ) be an abductive problem, 
where A contains the function substitutivity ax- 
ioms. Then for every explanation E of ( A,0 ), 
there is an explanation E' preferred over E such 
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that ->E' is S-deducible from SYM(SEP(FLAT(.4))) U 
{SEP(FLAT(->0))}. 

Proof. This follows by Theorem 3.2 and the fact that 
-iO is a goal clause, so that it does not need to be sym- 
metrized. ■ 

Fortunately, it appears that the function substitutiv- 
ity axioms are rarely needed in abductive problems when 
using surface deduction with separated clauses. 

Flattenings of a clause can be viewed as alternate 
representations of the clause’s term structure and are 
therefore essentially equivalent. Without loss of general- 
ity we restrict our attention to explanations E such that 
-iE is flat ( flat explanations). 

If E and E' are explanations of (A,0) such that E 
equationally implies E' but is not equationally equiva- 
lent to E ', then E' is strictly preferred over E. Given 
an explanation E of (.4, 0 ) there are many equa,tionally 
equivalent existential conjunctions of facts, all of which 
are also explanations of (A,0). The preference criteria 
introduced so far do not distinguish among equationally 
equivalent explanations. Using the intuition that a “sim- 
pler” explanation should be preferred, we give a stronger 
definition of preference: 

Definition. Let E and E' be flat explanations. Then 
E' is strictly preferred over E iff either E equationally 
implies E' but is not equivalent to E ', or E is equation- 
ally equivalent to E' and E' has fewer atoms. 

Given these preference criteria, we have the following 
theorem which determines the most preferred flat expla- 
nation among equationally equivalent ones: 

Theorem 4.2 For any explanation E, if E' is the nega- 
tion of the equational residuum of -'E, then E' is the 
unique most preferred flat explanation among flat expla- 
nations equationally equivalent to E. 

Proof. Let :-4 be a flat clause equationally equiva- 
lent to ->E. If :- A is not reduced, then R.EDU( :- A) 
has fewer atoms than A and the corresponding expla- 
nation is therefore strictly preferred. Assume that :- A 
is reduced. If the equational residuum of :- A is not 
given by A , then the equational residuum of :- A has 

fewer atoms than :- A, so that the corresponding expla- 
nation is strictly preferred. The result now follows by the 
uniqueness theorem for equational residua, Theorem 3.4. 



5 An Application 

Examples from the domain of story comprehension and 
motivation analysis which demonstrate the need for the 



inclusion of equality in abductive reasoning are given 
in [Charniak 1988]. Here we give an example from a 
different domain. 

Consider the following (imaginary, but realistic) sit- 
uation. A researcher X experimentally determines the 
value of a quantity associated with a physical object (e.g. 
the mass of an isotope of an element) and sends us the 
result. We have independently obtained a value for the 
same quantity (by theory and/or experiment) and our 
value differs from X’s value. We believe our value to 
be correct and we would like to explain the discrepancy. 
We do not know the exact means by which X’s value 
was obtained, but we know what kinds of experimental 
apparatus X might have used. One kind of apparatus 
(type A) is notorious for a hard-to-control drift in the 
settings which results in a systematic bias in the read- 
ings. Thus we can explain the discrepancy between our 
and X’s values by hypothesizing that X used apparatus 
of type A with a systematic bias equal to the difference 
between the two values. 

The situation is formalized as follows: Let TA(x) 
mean that x is an apparatus of type A. Let Vt(y) be the 
true value of quantity y , Vm(z,y) the value of quantity 
y measured in experiment z, A(u) the apparatus used in 
experiment u and B(x ) the systematic bias of apparatus 
x. The quantity measured by X is q , and the experi- 
ment performed by X is given the name e. With these 
definitions, our knowledge T consists of the clauses 

Tl: Vt(q) = 0:- 

T2: Vm(x l ,x 2 ) = Vt(x 2 ) + ^(A/aq)) 

TA(A( Xl )) 

T3: aq = 0 + x x :- 

where knowledge about other types of apparatus and the- 
orems about real numbers other than T3 have been omit- 
ted. The observation 0 is given by 

0: Vm(e, q) = 2 

The first task is to obtain a flattening of T and the 
negation of the observation: 

fTl: x 1 = 0 :-aq = Vt(x 2 ),x 2 = q. 

fT2: x 4 = x 5 + x 6 TA(x 3 ), x 6 = B( x 3 ), x 4 = 

V m(x 1 ,x 2 ), x 5 = Vt(x 2 ), x 3 = A(x 1 ). 
fT3: x x = x 2 T x x x 2 = 0. 

fO: •- aq = 2, aq = Vm( x 2 ,x 3 ), x 2 = e, x 3 = q. 

The clauses fTl and fO are separated. Separated 
clauses for fT2 and fT3 are given by 

sfT2: x 4 = x 5 + x 6 TA(x 3 ), x 6 = B(x 7 ), x 3 = x 8 , 

aq = x 8 , x 4 = Vm(x 1 ,x 2 ), x 5 = Vt(x 10 ), x 2 = 

*^9 5 *10 *9> *3 — •^(®Xl)j = 1i *^11 = ^12- 

sfT3: x 4 = x 2 + x 3 x 3 = x 4 , x x = x 4 , x 2 = 0. 
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All clauses of T have equalities in their heads and 
need to be symmetrized. The fully transformed set of 
clauses is given by 

Tib x 3 = x 4 x 3 = x 5 , x t = x 5 , x 6 = x 4 , x 6 = 0, 

x x = Vt(x 2 ), x 2 = q. 

T2': a ; 13 = x 14 x 13 = x 15 , x 4 = x ls , x 16 = x 14 , 

*ie = *s + *e, rA(.r 3 ), x 6 = B(x 7 ), x 3 = x 8 , 
x 7 = x 8 , x 4 = Vm(x 1 ,x 2 ), x 5 = V<(x 10 ), x 2 = 

* 9 , *10 = * 9 , *3 — 4 ( x n ), *1 = * 12 ) *11 = * 12 - 

T3': x 5 = Xq x 5 = x 7 , Xj = x 7 , x 8 = x 6 , x 8 = 

x 2 + * 3 , *3 = * 4 , *i = * 4 , *2 = 0 . 

Ob X! = 2, x 4 = Vm(x 2 , x 3 ), x 2 = e, x 3 = ( 7 . 



The negation of the desired explanation can now be 
deduced from Ob In the deduction below, the literals 
involved in each step are underlined. As is usually the 
case, the function substitutivity axioms are not needed. 
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root fact., surf. x 19 == 2 , = x 31 , x 25 == x 31 , 

fact, and compr. x 9 = x 28 , x 25 = x 28 , 7b4(x 6 ), x 9 = 
£(* 10 ), x 6 = x n , x 10 = x n , x 8 = 0, 
*s = Vt(x 3 ), x 6 = A(x 14 ), x 2 = 

* 15 ) *14 = * 15 ) X 2 — e i X 3 = Q - 



root fact., surf. x 9 = 2, TA(x 6 ), x 9 = B(x l0 ), 
fact., and compr. x 6 = x n , x 10 = x n , x 8 A Q , x 8 = 
Vt(Xg), Xq = A(x 14 ), X 2 = X 15 , 
x \4 = * 15 , x 2 ~ e , *3 = 9 * 

res. with Tl' x 9 = 2, T A(x 6 ), x 9 = B(x 10 ), 

*6 = * 11 , *10 = * 11 > x 8 ~ * 21 , 
x \7 ~ *21 > X 22 ~ 0 ; 

*18 = 9 , *8 = Vt{ x 3) > X 6 = - 4 (^ 14 ), 
*2 = x 15 , x 14 = x 15 , x 2 = e, x 3 = 9 . 



surf, fact., root x 9 = 2, TA(x 6 ), x 9 = 5(x 10 ), 
fact, and compr. x 6 = x n , x 10 = x n , x 6 = A(x 14 ), 

x 2 ~ x 15 i X 14 = x 15 , X 2 — e i x 8 = Q- 



reduction to the x 6 = A(x 2 ), x 2 = e, TA(x 6 ), 
min. residuum x 9 = B(x 6 ), x 9 = 2. 

The last clause is the negation of the desired expla- 
nation. Note how two resolutions with Tl' were used to 
simulate symmetry. 



6 Conclusion 

From a theoretical perspective, surface deduction is very 
appealing in its simplicity. We have seen how (at least 
in theory) surface deduction can be applied in situations 
such as abductive reasoning where deduction rather than 
refutation is the primary goal. 

If the equality theory of interest contains function 
substitutivity, a problem with using surface deduction 
for abduction is that in general the function substitutiv- 
ity axioms are still required. Current research indicates 
that to a large extent, the function substitutivity axioms 
can be ignored in abductive problems when using surface 
deduction with symmetrized, separated and flat clauses. 
We do not know any practical example where this is not 
the case. 

From a practical point of view, one of the frequently 
recognized problems with flattening the clauses of the 
input theory is that one loses most of the advantages of 
unification, particularly if the input theory contains few 
equalities. One can regain some of these advantages in 
practice by interpreting the set of equalities in the body 
of a clause as a directed graph or hypergraph (with arcs 
from the root variables to the surface terms) which de- 
fines the set of possible definitions of the main terms 
and variables of the clause. Such a directed graph gen- 
eralizes the usual tree representation of terms. Unifi- 
cation and more generally term rewriting can then be 
replaced by (hyper)graph rewriting rules. To implement 
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this idea, the deduction procedures must be substantially 
enhanced. The types of graph rewriting rules and graph 
representations needed require further research. 

The preference criteria for explanations given in Sec- 
tion 4 are very weak. However, we believe that no matter 
what preference criteria are used, RES(C) is at least as 
good an explanation as C. One of the most important 
problems in abductive reasoning is to determine stronger 
preference criteria to avoid combinatorial explosion. 
These issues are discussed in [Poole and Provan 1990]. 

Many of the results used in this paper can be general- 
ized to arbitrary clauses so that the restriction of abduc- 
tive reasoning to Horn clause theories can be removed. 
These generalizations will be the topic of a forthcoming 
paper. 
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Abstract 

This paper presents a form of reasoning called 
“hypothetico-deduction”, that can be used to address 
the problem of multiple explanations which arises in 
the application of abduction to knowledge assimilation 
and diagnosis. 

In a framework of hypothetico-deductive reasoning 
the knowledge is split into the theory T and observable 
relations S which may be tested through experiments. 
The basic idea behind the reasoning process is to 
formulate and decide between alternative hypotheses. 
This is performed through an interaction between the 
theory and the actual observations. The technique 
allows this interaction to be user mediated, permitting 
the acquisition of further information through 
experimental tests. Abductive explanations which have 
all their empirical consequences observed are said to be 
“fully corroborated”. 

We set up the basic theoretical framework for 
hypothetico-deductive reasoning and develop a 
corresponding proof procedure. We demonstrate how 
hypothetico-deductive reasoning deals with one of the 
main characteristics of common-sense reasoning, 
namely incomplete information, through the use of 
partial corroboration. We study the extension of basic 
hypothetico-deductive reasoning applied to theories 
that incorporate default reasoning as captured by 
negation-as failure (NAF) in Logic Programming. This 
is applied to the domain of Temporal Reasoning, where 
NAF is used to formulate default persistence. We show 
how it can be used successfully to tackle typical 
problems in this domain. 

1 Motivation 

Abduction is commonly adopted as an approach to 
diagnostic reasoning [Reggia & Nau, 1984], [Poole, 
1988], However, there are frequently many possible 
abductive explanations for a given observation. This is 
the problem of "multiple explanations". In order to 
choose between these explanations it becomes 
necessary to collect more information. Consider the 
Crime Detection example formalized below (Theory 
Tl). 

Suppose we arrive at the scene of the crime and the 
first observation we make is that someone is dead. We 
seek an explanation for this on the basis of the theory 
Tl above. Suppose we accept that there are only three 
possible causes of death: being strangled, being 
stabbed, or drinking arsenic (these are technically 
known as the abducibles). Simple abduction starting 
from the observation "dead" yields precisely these three 
possible explanations. In order to choose between these 



multiple explanations, we need to collect more 
information. For example, if we examined the corpse 
and discovered that there were marks on the neck, we 

Theory Tl 

strangled -A dead strangled -» neck_marks 
bloodjoss -> dead stabbed -» bloodjoss 
poisoned -> dead drunk_arsenic -> poisoned 
drunk_arsenic -> blue_tongue 



might take this as evidence for the first explanation 
over the others. Moreover, we know that drinking 
arsenic also has the consequence of leaving the victim 
with a blue tongue, so we might like to look for that. 

One approach to deciding between multiple 
explanations is through the performance of crucial 
experiments ([Sattar & Goebel, 1989]): pairs of 
explanations are examined for contradictory 
consequences, and an experiment is performed which 
refutes one of them whilst simultaneously 
corroborating the other. With n competing 
explanations we must thus perform at most (n-1) 
crucial experiments . 

The crucial experiment approach is, however, unable 
to choose between explanations when they fail to have 
contradictory consequences or when they have 
contradictory consequences that are not empirically 
determinable (e.g. Tychonic and Copernican world 
systems). In our example, for instance, the explanations 
"strangled" and "stabbed" are not incompatible. It is 
possible that the victim was both strangled and stabbed. 
As result, there can be no crucial experiment that will 
decide between the two. However, further evidence 
might lead us to accept one explanation, whilst 
tentatively rejecting the other. For example, knowledge 
that the person exhibits marks on the neck supports the 
"strangled" hypothesis. In fact we have all the 
theoretically necessary observations to conclude that 
the victim was strangled. On the other hand, the 
"stabbed" hypothesis implies "bloodjoss", which if not 
observed might lead us to favour the "strangled" 
explanation. Note that later evidence of blood loss 
would lead us to return to the "stabbed" hypothesis (in 
addition to "strangled"). From our viewpoint, crucial 
experiments are the special case of general 
hypothetico-deductive reasoning when an hypothesis is 
refuted whilst simultaneously corroborating a second. 

The process of hypothetico-deductive reasoning 
allows the formation and testing of hypotheses within 
an interactive framework which is applicable to a wide 
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class of applications and is implementable using 
existing technology for resolution. 

The technique of hypothetico-deductive reasoning 
has its origin in the Philosophy of Science. It was 
primarily proposed by opponents of Scientific 
Induction. Its notable contributors were Karl Popper 
([Popper, 1959], [Popper, 1965]), and Carl Hempel 
[Hempel, 1965]. In its original context, hypothetico- 
deduction is a method of creating scientific theories by 
making an hypothesis from which results already 
obtained could have been deduced and which entails 
new predictions that can be corroborated or refuted. It 
is based on the idea that hypotheses cannot be derived 
from observation, but once formulated can be tested 
against observation. 

The hypothetico-deductive mechanism we formulate, 
resembles this method in having the two components of 
hypothesis formation and corroboration. It differs from 
the accepted usage of the term in philosophy of science 
by the status of the hypothesis formation component. 

In the philosophy of the process of hypothesis 
formation is equivalent to theory formation: a creative 
process in which a complete theory is constructed to 
account for the known observations. By contrast, the 
method we describe here starts with a fixed generalized 
theory which is assumed to be complete and correct. 
The task is to construct some hypotheses which when 
added to the theory have the known observations as 
logical consequences. The process is more akin to that 
used by an engineer when they apply classical 
mechanics to a particular situation: they don’t seek a 
new physical theory, but rather a set of hypotheses 
which would explain what they have observed. Since, 
for us, hypothesis formation can be mechanized, we do 
not have to tackle the traditional issues of the 
philosophy of science concerning the basis of theory 
formation. We thus avoid (like Poole before us [Poole, 
1988, p.28]) one of the most difficult problems of 
science. 

This paper is organized as follows. We first describe 
the reasoning process and present the logical structure 
of the reasoning mechanism, indicating how it relates to 
classical deduction and model theory. Abductive and 
corroborative derivation procedures for implementing 
the reasoning process are then defined through 
resolution. We indicate how this reasoning technique 
relates to current work on abduction and diagnostic 
reasoning, and suggest some possible extensions. We 
illustrate the features and applicability of this reasoning 
method with several examples. We then describe the 
extension of hypothetico-deduction to apply to theories 
which include some form of default reasoning, using 
negation-as-failure as an example. We consider a 
typical application of defaults in causal reasoning, 
namely default persistence, and provide several further 
examples which illustrate this extension. 

2 Hypothetico-deductive Framework 

Suppose we have a fixed logical theory T about the 
world. For example, it might be a medical model of the 
anatomy, or a representation of the connections in an 
electrical network, or a model of the flow of urban 
traffic in Madrid. Let us divide the relations in the 
theory into two categories: empirical and theoretical. 
How we make this distinction will depend on how we 
interpret these relations in the domain for the theory. 
An empirical relation is one which can be (or has been) 
observed. For example, the blood pressure of a patient, 
the status of a circuit-breaker (open or closed), or the 
number of cars passing some point. By contrast, a 



theoretical relation is in principle not observable. 
Examples of theoretical relations might be infection 
with an influenza virus, the occurrence of a short-circuit 
from the viewpoint of a control centre, or the density of 
traffic at some point. 

Suppose we want an explanation for G on the basis 
of the theory. By this, what we mean is “what relations 
(we will call them hypotheses ) might be true in order to 
have given rise to G?”. The answer to this question 
could involve either theoretical or empirical relations. 
In order to be confident that an explanation is the 
correct explanation it is useful to test it. Explanations in 
terms of empirical relations are directly testable. In the 
simplest case we just consider the other observations we 
have already made; in more complicated cases, we may 
need to “go and look” or even perform an 
“experiment”. Explanations in terms of theoretical 
relations must be tested indirectly, by deducing their 
empirical consequences, and testing these. 

Unfortunately, not all hypotheses that might give rise 
to the observation G serve as explanations, regardless as 
to whether they pass any tests. Some are too trivial such 
as taking G as an explanation for itself. Others we rule 
out as unsuitably shallow. For example, suppose we 
sought an explanation for the observation “Jo laughed 
at the joke”; one possible hypothesis is because “the 
joke was funny”. However, what we really wanted was a 
deeper explanation: Why was the joke funny? We 
therefore designate certain types of hypotheses as 
explanatory (or, more strictly, “abducible”). 

The problem of explanation, as far as we are 
concerned in this paper, is the problem of constructing 
abducible hypotheses which when we add them to T 
will have G as a logical consequence. Furthermore, 
explanations must pass (direct or indirect) tests. 

The process of constructing hypotheses which have 
G as a deductive consequence is an example of 
hypothesis formation. It is this stage that corresponds 
to the “hypothetico-” component^ of hypothetico- 
deductive reasoning. The process of testing an 
explanation is an example of corroboration. It is this 
stage that corresponds to the “deductive” component 
of hypothetico-deductive reasoning. This is because we 
use deduction to determine the empirical consequences 
of a given explanation. The process of hypothetico- 
deductive reasoning can now be formulated as the 
construction of an explanation for an observation 
through interleaving hypothesis formation and 
corroboration. 

3 The Hypothetico-deductive 
Mechanism 

Let us consider the mechanism for hypothetico- 
deductive reasoning in more detail. To simplify matters 
we shall require that our theory is composed of rules 
and no facts. In logical terms, an hypothesis (and thus 
an explanation) will be a set of ground atomic well- 
formed formulae. 

Suppose we have a (usually causal) theory T, an 
observation set O, a set of abducible atomic formulae A, 
and a particular observation G from O which we wish to 
explain. Let O' = O-G. In addition we define a set S, the 
observables, containing all the formulae that can occur 
in 0. 

There are three components to the reasoning 
process: hypothesis formation, hypothesis 

corroboration, and explanation corroboration. In 
outline, we carry out hypothesis formation on G, and 
for each component formula in the resultant 
hypothesis. We repeat this process until all that remains 
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is a set of abducible relations constituting the 
explanation. We also carry out hypothesis 
corroboration at each formation point. Finally we 
reason forwards from the explanation to perform 
explanation corroboration. 

Hypothesis Formation 

From any ground atomic formula F we form an 
hypothesis for that formula. This is done by 
determining which rules in T might allow F as a 
conclusion, and forming an hypothesis from the 
antecedents of each such rule (after carrying out the 
relevant substitutions dictated by F). Each hypothesis is 
thus sufficient to allow the conclusion of F. 

Hypothesis Corroboration 

An hypothesis for an observation may contain 
instances of observables defined by S. For each such 
component we check to see whether it is an observation 
recorded in O'. If it is a member of O' then it is 
corroborated and we can retain it. However, where any 
component is not corroborated in this fashion, we reject 
the entire hypothesis. 

Explanation Corroboration 

An hypothesis H which is composed entirely of 
instances of abducible predicates defined by A is an 
explanatory hypothesis. To corroborate H, we use T to 
reason forwards from H as an assumption. Each logical 
consequence of H which is also an instance of an 
observable is checked against O' for corroboration 
(similar to “hypothesis corroboration”). If it does not 
occur in O' then the original hypothesis H is rejected. If 
all observable consequences are corroborated, then the 
explanation H is said to be corroborated. 

In general, rules may have more than one literal in 
their antecedent. We must also check the satisfaction of 
the other literals in a given rule by reasoning backwards 
until we reach either one of the observations in O' or 
one of the other explanatory hypotheses. If neither of 
these two situations arise, the rule is discarded from the 
forward reasoning process. 

We make a distinction between corroboration failure , 
where an hypothesis or prediction does not occur in the 
observation set O', and refutation, where the negation 
of an hypothesis or prediction occurs in O'. Normally 
the form of O and T means that refutation is impossible 
(see the next section for details of this form). Later we 
suggest an extension which allows the possibility of 
refutation in addition to corroboration failure. In cases 
where it is natural to apply the closed world assumption 
to 0, these two situations will coincide. 

4 The Logical Structure of 
Hypothetico-deductive Reasoning 

Suppose we have a theory T composed of definite 
Horn clauses and an observation set of ground atomic 
well-formed formulae O. Let the set of ground atomic 
formulae which can occur in O be S, the observables. 
Similarly, let us define a set of distinguished ground 
atomic formulae A, the abducibles, in terms of which 
all explanations must be constructed. An explanation 
will be a member of the set A. We will assume that the 
theory T alone does not entail any empirical 
observation without some other empirical input i.e. 
there does not exist any formula <j) such that <j) e S and 

TN<j). Consider also a ground atomic formula G (a 
member of S) for which we seek an explanation. 



Given the 4-tuple <T,0,A,S>, a corroborated 
explanation A for G, is a set of ground atomic well- 
formed formulae, which fulfils all of the following 
criteria: 

(1) Each formula in A must be a member of A. 

(2) TuA l= G 

(3) If T u A t= n and He S, then HcO 

An explanation set A which satisfies (1) and (2) but not 
(3) is said to be uncorroborated. 

This formulation is easily generalized to explanation 
for multiple observations by simply replacing G with a 
conjunction of ground atomic formulae. 

We note that since at this stage we have taken our 
theories to be Horn, a simple extension to hypothetico- 
deductive reasoning allows us to distinguish between 
explanation refutation when a prediction is inconsistent 
with observation, and merely the failure of 
corroboration where a prediction is consistent with 
known observations but not present in them. Such an 
extension would allow a hypothetico-deductive system 
to deal with circumstances where our observations 
cannot ever be complete (where we know our fault- 
detection system is itself fallible, for instance). We 
could then discard only those explanations that are 
refuted, and order the remaining ones according to 
their degree of corroboration (corresponding to 
Popper’s notion of versimilitude, [Popper, 1965]). A 
later section discusses the extension of hypothetico- 
deductive reasoning to theories which include negation- 
as-failure. 

This extended version of hypothetico-deductive 
reasoning is non-monotonic because later information 
might serve to refute a partially corroborated 
explanation. To return to our first example for instance, 
the observation that the victim does not have a blue 
tongue would lead us to reject the hypothesis that they 
had drunk arsenic (even if previously this hypothesis 
had some observational consequences which had been 
observed). 

5 Hypothetico-deductive Proof 
Procedure 

A resolution proof procedure which implements 
hypothetico-deductive reasoning is formally presented 
below. Basically we define two types of derivation: 
abductive derivation and corroboration derivation 
which are then interleaved to define the proof 
procedure. Abductive derivation corresponds to the 
processes of hypothesis formation and corroboration, 
deriving hypotheses for goals. Corroboration derivation 
corresponds to the process of explanation 
corroboration, deriving predictions from goals. There 
are two different ways to interleave the abductive and 
deductive components of the reasoning mechanism. 
One approach is to derive all the abducible literals in 
the hypothesis for an observation, before any of them 
are corroborated. The second approach attempts 
corroboration as soon as an abducible literal is derived, 
postponing consideration of other (non-abducible) 
literals in the hypothesis. Here we present a proof 
procedure based on the second approach. 

Definition {safe selection rule ) 

A safe selection rule R is a (partial) function which, 

given a goal Li Lk k>l returns an atom Lj, 

i=l,...,k such that: 

either i) L, is not abducible; 

or ii) Lj is ground. 
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Definition ( Hypothetico-deductive proof procedure) 

An abductive derivation from (Gi Aj) to (G n A n ) 
via a safe selection rule R is a sequence 

(Gi Ai), (G2A2), ... ,(G n A n ) 
such that for each i>l Gi has the form <— Li,...,Lk, 
R(G0=Lj and (Gi+i Aj+i) is obtained according to one 
of the following rules: 

Al) If Lj is neither an abducible nor an observable, 
then Gi+i=C and A,+i=Ai where C is the resolvent 
of some clause in T with Gi on the selected literal 
Lj; 

A2) If Lj is observable, then Gj+i=C and Ai+i=Ai 
where C is the resolvent of C : < — Lj ,...,Lj ,...,Lk 
with some clause in T on Lj where <— Li ,...,Lj. 
1 ,Lj+j ,...,Lk is the resolvent of Gi with some 
clause (ground assertion) Lj in O on the selected 
literal Lj; 

A3) If Lj is abducible and Lje A i, then 
Gi+i= <— Li,...,Lj-i,Lj+i,...,Lk and Aj+i=Aj; 

A4) If Lj is abducible and Ljg A and there exists a 
corroboration derivation from ({Lj} AiU{Lj}) to 

({ } A") then Gi+i = <— Li Lj.[, Lj+i,...,Lk and 

A i+ i=A0 

Step Al) is an SLD-resolution step with the rules of 
T. In step A2) under the assumption that observables 
and abducibles are disjoint we need to reason backward 
from the true observables in the goal to find 
explanations for them since the definition of an 
explanation requires that it logically implies G in the 
theory T alone without the set of observations O. Step 
A3) handles the case where an abductive hypotheses is 
required more than once. In step A4) a new abductive 
hypotheses is required which is added to the current set 
of hypotheses provided it is corroborated. 

A corroboration derivation from (Fj Ai) to (F n A n ) is 
a sequence 

(Fi Ai),(F 2 A 2 ) ... (F n A n ) to (F n A n ) 
such that for each i>l Fj has the form [H<— Li,...,Lk] u 
Fj' and (Fi + i Aj+i) is obtained according to one of the 
following rules: 

Cl) If H is not observable then Fj+i = C' u Fi" 
where C" is the set of all resolvents of clauses 
in T with H<— Li,...,Lk on the atom H and 
A i+ i=Ai; 

C2) If H is a ground observable, He O and 
Li,...,Lk is not empty then Fi+i = C' u Fj' 
where C" is <— Lj,...,Lk and Aj+i=Ai; If HeO 
then Fi+i = Fj' and Aj+i=Ai. 

C3) If H is a non ground observable, O^BxH and 
Li,...,Lk is not empty then Fi+i = C' U Fi" 
where C" is <— Li,...,Lk and Aj + i=Aj; 

C4) If H is a non ground observable and Lj is any 
non observable selected literal from Li,...,Lk 
then Fj+i = C' u Fi" where C" is the set of all 
resolvents of clauses in T u A; with 
H<— Li,...,Lk on the selected literal Lj and 
Ai + i=Ai; IfLj is observable the resolutions 
are done only with clauses in O. 

C5) If H is empty, Lj is any selected literal and Lj 
is not observable then Fj+i = C" u Fi' where 



C' is the set of all resolvents of clauses in T u 

Aj with <— Lj,...,Lk on the literal Lj and 

□ gC", and Aj + i=Ai; IfLj is observable the 

resolutions are done only with clauses in O. 

In step Cl) we “reason forward” from the 
conclusion H trying to generate a ground observable at 
the head. Once this happens if this observable is not 
“true” steps C2), C3) give the denial of the conditions 
that imply this observable. Step C4) reasons backward 
from the conditions either failing or trying to 
instantiate further the observable head. Step C5) 
reasons backward from the denials of steps C2), C3) 
until every possible such backward reasoning branch 
fails. Note that in the backward reasoning steps 
observables are resolved from the observations O and 
not the theory. More importantly notice that we do not 
reason forward from an observable that is true. 

Note that we have included the set of hypotheses A; 
in the definition of the corroboration derivation 
although this does not get affected by this part of the 
procedure. The reason for this is that more efficient 
extensions of the procedure can be defined by adding 
extra abducible information in the A; during the 
corroboration phase e.g.the required absence of sortie 
abducible A can be recorded by the addition of a new 
abducible A*. 

Theorem 

Let <T,0,A,S> be a Hypothetico-Deductive framework 
and G a ground atomic formula. If (<— G {}) has an 

adbuctive derivation to (□ , A) then the set A is a 
corroborated explanation for G. 

Proof (Sketch) 

The soundness of the abductive derivations follows 
directly from the soundness of SLD resolution for 
definite Horn theories as every abductive derivation 
step of this procedure can be mapped into an SLD 
resolution step. To show that the explanation A is 
corroborated let Ae S be any ground atomic logical 
consequence of T u A . Since T u A is a definite Horn 
theory A must belong to its minimal model which can 
be constructed in terms of the immediate consequence 
operator Tfvan Emden & Kowalski, 1976] . Hence 
there exists a finite integer n such that A e Tt u A T n 
(0) and A does not follow from T alone by our 
assumption on the form of the theory T . The result 
then follows by induction on the length of the 
corroboration derivation. 

6 Application of Hypothetico- 
deductive Reasoning 

In this section we will illustrate hypothetico- 
deductive reasoning with some examples. Before this it 
is worth pointing out that existing abductive diagnosis 
techniques (e.g. [Poole et al., 1987], [Davis, 1984], 
[Cox & Pietrzkowski, 1987], [Genesereth, 1984], 
[Reggia et al., 1983], [Sattar & Goebel, 1989]) can be 
accommodated within the HD framework. For example 
in the diagnosis of faults in electrical circuits 
hypothetico-deductive reasoning exhibits similar 
behaviour to [Genesereth, 1984], [Sattar & Goebel, 
1989], 

Problems and domains which are ideally suited to the 
application of hypothetico-deductive reasoning exhibit 
two characteristics. Firstly, they have a large number of 
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possible explanations in comparison to the number of 
empirical consequences of each of those explanations. 
Secondly, they have a minimal amount of observational 
data pertaining to a given explanation so that 
corroboration failure is maximized. 

To illustrate the manner in which general 
hypothetico-deductive reasoning deals with differing 
but compatible explanations, let us consider the 
example of abdominal pain first presented by [Pople, 
1985] and axiomatized in [Sattar & Goebel, 1990]. The 
axioms are reproduced below. To allow the possibility 
of several diseases occurring simultaneously, the three 
expressions which capture the fact that the symptoms 
(nausea, irritation_in_bowel, and heartburn) are 
incompatible, have been omitted. 

Theory T2 

abdominal_pain_symp(X) -» has_abdominal_pain 
problem_is(indigestion) abdominal_pain_symp(nausea) 
problem_is(dysentry) -> 

abdominal_pain_symp(irritation_in_bowel) 
problem_is(acidity) -» abdominal_pain_symp(heartbum) 



Now consider the following observations: 

Observations O 

has_abdominal_pain 

abdominal_pain_symp(nausea) 



Abducibles, A = { problem Js(indigestion), 

problemjs(dysentry), 
problem Js(acidity) ] 

Observables, S = 

{has_abdominal_pain, 
abdominal_pain_symp(nausea), 
abdominal_pain_symp(irritation_in_bowel), 
abdominal_pain_symp(heartburn) } 

There are three possible potential explanations for the 
observation “has_abdominal_pain”. Since they are not 
mutually incompatible (it is possible to have all three 
diseases, for example), there is no crucial literal which 
can help us distinguish between the three explanations. 
There is thus no “best” explanation from this point of 
view. 

From the point of view of hypothetico-deductive 
reasoning however, one of the explanations stands apart 
from the others. On the basis of all the currently 
available evidence “problem_is(indigestion)” is 
completely corroborated. The two remaining 
explanations remain possible but uncorroborated; that 
is to say there is no supplementary evidence in support 
of them. Experiments might be performed (testing for 
“abdominal_pain_symp(irritation_in_bowel)”, and 
“abdominal_pain_symp(heartburn)”) which could 
corroborate one or both of the others, which would lead 
us to extend our explanation. Since physical 
incompatibilities are rare in common-sense reasoning, 
hypothetico-deductive reasoning has an advantage in 
being able to offer a (revisable) “best” explanation 
based on the currently available evidence, in spite of the 
absence of possible crucial experiments. It is important 
to appreciate that it is usually impractical to simply 
construct the hypotheses by performing abduction on 
all the observations in O, since in general there may be 
an extremely large number of them. Moreover, only a 
few may be relevant to the particular observation for 
which we seek an explanation. 



It might be thought that the checking of all the 
observational consequences of some explanation might 
be equally impractical: there might be an infinite 
number of them as well. However, it must be borne in 
mind that we are only considering the representation of 
common-sense; we would normally ensure that there 
are only a small number of observable consequences in 
which we would be interested. We would define our set 
of observables, S, accordingly. So, for instance, in the 
fermentation example below we represent certain 
critical times (often referred to as "landmarks") at which 
we might perform observations. Similarly, in the 
“stolen car” example which we present later, we restrict 
observables to events that occurred at some specific 
point in time. 

One application area in which incomplete 
information is intrinsic, is that of temporal reasoning. 
Reasoning about time is constrained by the fact that 
factual information is only available concerning the 
past and the present. By its very nature we must 
perform temporal diagnosis with no knowledge about 
the future states of the systems we are trying to model. 

As an example of temporal diagnosis which 
illustrates this characteristic, consider an industrial 
process involving the fermentation of wine. Suppose we 
are faced with the task of diagnosing whether the 
fermentation process has proceeded normally, or that 
the extremely rare conditions have occurred under 
which we will produce a vintage wine. To do this we 
must carry out a test at some time after the wine- 
making process has begun, such as measuring its pH, its 
relative density, or its alcohol content. Suppose further 
that we need to decide on this diagnosis before a certain 
time, e.g. the bottling-time tomorrow. Let us refer to 
some property of the mixture which would be observed 
for vintage wine by the symbol pi, and that for 
ordinary wine as p2. These two properties might be 
entirely compatible: it is perfectly possible for ordinary 
wine to be produced under conditions which exhibit 
pl(as well as p2), but in such a case it is not the fact that 
the mixture is ordinary wine that causes pi to be 
observed. Now suppose we observe pi before the 
bottling time, and suppose there are no further 
observational consequences for the “vintage wine” 
hypothesis that are observable before tomorrow. Then 
the “vintage wine” hypothesis is completely 
corroborated within the defined time-scale. On the 
other hand, the “ordinary wine” hypothesis remains at 
best only partially corroborated. Hypothetico-deductive 
reasoning would then prefer the “vintage wine” 
hypothesis over the “ordinary wine” one. The 
temporal dimension illustrates the ability of 
hypothetico-deductive reasoning to form diagnoses on 
the basis of incomplete information. Notice that an 
extension of the time scale would revise the status of the 
observable relations and perhaps the “vintage wine” 
hypothesis would become only partially corroborated. 
The application of hypothetico-deductive reasoning to 
the temporal domain will be discussed in more detail in 
the next section as an important special case of the 
integration of hypothetico-deductive reasoning and 
default reasoning. 

7 Hypothetico-deduction with Default 
Theories 

As we discussed above, the aim of hypothetico- 
deductive reasoning has been to provide a framework 
in which we can tackle one of the main characteristics 
of common sense reasoning, namely incomplete 
information. More specifically it addresses the fact that 





